Big Data Solution Designer Resume
SUMMARY
- 10+ years of experience in Designing and Implementing Customer Intranet and Extranet Portal solutions for various clients in different sectors.
- More than Four years of experience in Hadoop Development/Administration built on six years of experience in Java Application Development.
- Good noledge and SME of Hadoop ecosystem.
- Experienced on working with Big Data and Hadoop File System (HDFS).
- Three years experience installing, configuring, testing Hadoop ecosystem components.
- Hands on Experience in working with ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, OoZie.
- Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
- Capturing data from existing databases dat provide SQL interfaces using Sqoop.
- Efficient in building hive, pig and map Reduce scripts.
- Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (me.e, MYSQL, Cassandra) to Hadoop.
- Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase
- Loaded teh dataset into Hive for ETL Operation.
- Good noledge on Hadoop Cluster architecture and monitoring teh cluster.
- Experience in using DBvisualizer, Zoo keeper and Cloudera Manager
- Worked in different areas including Collaboration, Document Management, Portal and Web Content Management.
- Expert in SharePoint Governance and used Best Practices for numerous client requirements.
- Implemented Taxonomy and Folksonomy for clients
- Well rounded experience in teh full software life cycle development.
- Excellent analytical, problem solving, and communication skills.
- Development and Designing skills for multi - tiered web applications in Developing and Implementing Web Applications using VB.NET and Visual Studio .NET and Brainbench Certified for ASP.NET
- Microsoft Certified Professional
- Certified Enterprise Architect from University of Toronto.
- Certified Hadoop Developer and Administrator.
TECHNICAL SKILLS
Programming Languages: .NET, VB.NET, C#.NET, ASP.NET, JAVA, J2EE, C++, C, Visual Basic 6.0, Python, R
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Giraffe, Tableau, SAS.
UNIX Tools: Apache, Yum, RPM
Web Technologies: ASP.NET, SharePoint Apps, AJAX, jQuery, InfoPath 2010, JSP, PHP, XML, XSLT, XPath, VBScript, JavaScript, and Unified Access Gateway, BHOLD, SPML, ADFS, UAG, Bash
Database Technologies: SQL Server 2005/2008, SQL Server Reporting Services, Forefront Identity Management 2010, Report Builder 3.0, Crystal Reports, Oracle 8i/9i/10g, DB2, MySQL, Cassandra, MongoDB
Operating Systems: Linux AS3.0, Solaris, WINDOWS NT/2000/XP/2003
Enterprise Applications: Forefront Identity Management 2010, Unified Access Gateway, Documentum, MQ Series, Siebel, eRoom, MS Office 2003/2007, MS FrontPage 2003
Web Services: SOAP, DAML-S, BPEL4WS, FIM Web Service, Kerberos
Source Control Systems: Visual Source Safe, Subversion, PVCS, ClearCase, Team Foundation Server.
Application Server: Tomcat, BEA Weblogic, IBM Web Sphere, MKS
PROFESSIONAL EXPERIENCE
Confidential
Big Data Solution Designer
Responsibilities:
- Leverage Hadoop and Cloudera cluster to Perform large scale data extraction and ingestion into data Lake
- Regression techniques and Reinforcement Learning techniques used on projects for predictive analytics
- Neural Networks use cases done using CNN and RNN datasets.
- Used Python/PySpark and NoSQL database to create modelling and prediction for teh data.
- Converted SAS POC code into Python/PySpark for data analysis on legacy data
- Predictive modeling and feature selection on teh large datasets in HDFS
- Created Sentiment analysis using Chatbots in Scala and Python,
- Developed a Scala project to profile pdsium entities and flow them into HS+DFS data lake
- Analyze Hadoop clusters using Big Data ecosystem and its tools including Hive, Spark and MapReduce
- Conduct in-depth research on Hive to analyze partitioned and bucketed data
- Leveraged Sqoop to import data from RDBMS into HDFS
- Collaborated on Java projects to bring data from various sources into Hive/HDFS.
- Used CI/CD and Jenkins for development and Agile methodologies on teh projects
- Developed ETL framework using Data Warehousing and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
- Performed cleaning and filtering on imported data using Hive and MapReduce.
- Exporting a HDFS parquet file on customer data from HDFS to MySQL.
- Analyzed teh Web/Application real time log data and flow to HDFS using Kafka/Spark Streaming.
Confidential
Responsibilities:
- Regression techniques and Reinforcement Learning techniques used on projects for predictive analytics
- Neural Networks use cases done using CNN and RNN datasets.
- Used Keras and Tensorflow to model Deep learning datasets and predict results for various financial and non-financial projects.
- Created Option plan financing simulation and prediction modelling using scikit learn, pyspark, hadoop and supervised learning
- Created ML pipelines for analytics on datasets using Regression and classification models such as Logistic regression, K means, Random Forest.
- Used R, SAS and Python for model creation, visualization, data cleaning.
- Created Machine Learning Pipelines in R and Python to create multiple models for predictive analysis and save it in pickle.
- Used Pyspark and No SQL database to create modelling and prediction for teh data.
- Used Excel Modeling and Simulation using @RISK and Excel Precision Tree to create Decision Tree in order to solve multivariate Problem.
Confidential
Big Data and IoT Engineer
Responsibilities:
- Leverage Hadoop and HDP to Perform large scale data ingestion and data analysis
- Regression Analysis and Reinforcement Learning techniques used on projects
- Used Excel Modeling and Simulation using @RISK and Excel Precision Tree to create Decision Tree in order to solve multivariate Problem.
- Neural Networks POC done for teh client.
- Created ML pipelines in R to perform data cleaning and visualization and used Python MLib in Anaconda to create Machine Learning models and analysis.
- Derived teh logic using SAS analysis is used to create new SQL metadata structures on HDFS
- Converted SAS POC code into R and pyspark for data analysis on legacy data
- Predictive modeling and feature selection on teh large datasets in HDFS using pyspark
- Worked on multiple Data extraction and ingestion use cases.
- Worked in Shell Scripting and performed daily job troubleshooting and monitoring
- Good expertise in reporting toolBASE-SAS 9.1.3
- Analyze Hadoop clusters using big data analytic tools including Hive, Spark and MapReduce
- Conduct in-depth research on Hive to analyze partitioned and bucketed data
- Leveraged Sqoop to import data from RDBMS into HDFS
- Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
- Performed cleaning and filtering on imported data using Hive and MapReduce
- Exporting a HDFS parquet file on customer data from HDFS to MySQL.
- Extensive experience in data ingestion technologies such as Flume, Kafka
- Used SSH to connect to various environments in both Dev and production using password less SSH technique.
- Imported multiple SQL/No SQL database tables into HDFS with default delimiters using non-default file formats
- Employ security using an autantication and authorization system such as Kerberos.
- Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
- Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
- Created Hive queries along with oozie workflows using Shell Script and Jenkins
- Created Visualizations in MicroStrategy in teh forms of enterprise dashboards and reports
- Created SQL Queries in MicroStrategy to filter our and optimize results for better datasets
- Written Hive UDFS to extract data from SQL databases as well other RDBMS databases.
- Created test cases during two-week sprints using agile methodology.
- Used CI/CD with Git to run automated workflows for code deployment to all platforms
- Designed data visualization to present current impact and growth.
- Provided Hadoop training regarding installation configuration and rack awareness.
- Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
- Organized content to optimize query performance.
- Analyzed teh Web/Application log data and flow to HDFS using Kafka.
Confidential
Senior Data Analyst/Data Scientist
Responsibilities:
- Leverage Hadoop and HDP to analyze massive amounts of call data and identify and improve teh churn rates for teh clients
- Analyze Hadoop clusters using big data analytic tools including Pig, Hive, and MapReduce
- Conduct in-depth research on Hive to analyze partitioned and bucketed data
- Developed NIFI to automate teh loading of data into HDFS and Hive for data pre-processing
- Architected 10-node Hadoop clusters with HortonWorks
- Successfully implemented HWX on a 15-node cluster in AWS
- Used Logistic and other ML algorithms to provide churn prediction results and improve performance by 0.5 to 1 % churn.
- Worked in Shell Scripting and performed daily job troubleshooting and monitoring
- Worked on various data extraction and ingestion use cases for multiple projects.
- Leveraged Sqoop to import data from RDBMS into HDFS
- Experience in utilizing SAS Functions, Procedures, Macros, and other SAS application for data updates, data cleansing, and reporting.
- Skilled in merging SAS datasets, preparing data, producing and validating reports, SAS formats, and managing data.
- Migrated existing SAS ETL and analytics code to teh Spark.
- Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
- Performed cleaning and filtering on imported data using Hive and MapReduce
- Converted Python Noteboks in Jupyter to Pysaprk Notebooks using Pandas and other libraries
- Used R to create Machine Learning piplines to data clean visualize data and
- Exporting a HDFS parquet file on customer data from HDFS to MySQL.
- Extensive experience in data ingestion technologies such as Flume, Kafka and NiFi,
- Used Kafka, NiFi and Flume to gain real time and near real time data in HDFS and HBase from 3rd party sources and Data Lakes.
- Using teh scala build tool (sbt) with spark to package applications.
- Used SSH to connect to various environments in both Dev and production using password less SSH technique.
- Imported multiple No SQL database tables into HDFS with default delimiters using non-default file formats
- Employ security using an autantication and authorization system such as Kerberos.
- Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
- Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
- Created Hive queries along with oozie workflows using Java application using HQL libraries.
- Written Hive UDFS to extract data from NoSQL databases as well other RDBMS databases using HQL.
- Created test cases during two week sprints using agile methodology.
- Designed data visualization to present current impact and growth.
- Developed Natural Language Processing and Machine Learning Systems using Python/C++.
- Created MapReduce program in Python and ran on Centos using shell.
- Provided Hadoop training regarding installation configuration and rack awareness.
- Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
- Involved in teh regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Created AWS ER and EC2 cluster to mimic On Premise HWX clusters
- Crated Roles and Users in AWS using Dashboard
- Migrated 50 TB of data use AWS Direct Conenct to AWS buckets and Redshift on AWS.
- Fix any stale configurations on teh cluster and Cloud setup.
- Organized content to optimize query performance.
- Analyzed teh web log data.
Confidential
Senior Big Data Engineer/Consultant
Responsibilities:
- Providing Hadoop ecosystem guidance and topology. Provided feature and upgrade best practices and SME on product use, customizations and maintenance
- Installing and upgrading Apache Hadoop ecosystem on a cluster using Ambari and command line.
- Configuring and memory tuning Apache Spark/Scala Jobs within HortonWorks.
- Regression Analysis and Reinforcement Learning techniques used on projects
- Used Incremental import technique to import various kind of data formats,
- Used various legacy Java APIs and classes within new Scala/Spark functions.
- Exporting a HDFS parquet file on customer data from HDFS to MySQL.
- Extensive experience in data ingestion technologies such as Flume, Kafka and NiFi,
- Used Kafka, NiFi and Flume to gain real time and near real time data in HDFS and HBase from 3rd party sources and Data Lakes.
- Created Various Oozie jobs in HortonWorks distribution and performed error handling in Oozie for teh same.
- Troubleshoot various performance and Memory Management issues with Spark jobs.
- Tuned teh Data structure and deserialized RDD storages. Using Kyro Serialization.
- Using teh scala build tool (sbt) with spark to package applications.
- Used SSH to connect to various environments in both Dev and production using password less SSH technique.
- Imported multiple No SQL database tables into HDFS with default delimiters using non-default file formats
- Employ security using an autantication and authorization system such as Kerberos.
- Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
- Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
- Created Hive queries along with oozie workflows using Java application
- Written Hive UDFS to extract data from NoSQL databases as well other RDBMS databases.
- Created test cases during two week sprints using agile methodology.
- Designed data visualization to present current impact and growth.
- Developed Natural Language Processing and Machine Learning Systems using Python/C++.
- Created ML pipelines in R to perform data cleaning and visualization and used Python MLib in Jupyter/Spyder to create Machine Learning models and analysis.
- Created MapReduce program in Python and ran on Centos using shell.
- Provided Hadoop training regarding installation configuration and rack awareness.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
- Involved in teh regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
- Fix any stale configurations on teh cluster and Cloud setup.
- Organized content to optimize query performance.
- Analyzed teh web log data.
Confidential
Hadoop Data Engineer/Administrator
Responsibilities:
- Providing Hadoop ecosystem guidance and topology. Provided feature awareness and upgraded best practices and SME on product use, customizations and maintenance in HortonWorks.
- Installed and configured MapReduce, HIVE and teh HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Supported code/design analysis, strategy development and project planning.
- Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
- Replaced default Derby metadata storage system for Hive with MySQL system
- Executed queries using Hive and developed Map-Reduce jobs to analyze data.
- Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
- Developed teh Pig UDF's to preprocess teh data for analysis.
- Developed Hive queries for teh analysts.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
- Importing and exporting data in HDFS and Hive using Sqoop in Hortonworks.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed teh data.
- Environment setup/configuration was done using Apache Hadoop by Cloudera Manager.
- Involved in loading data from LINUX and UNIX file system to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Shell and Pig.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Assisted with data capacity planning and node forecasting.
- Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Administrator for Pig, Hive and HBase installing updates, patches and upgrades.
Confidential
Technical Consultant
Responsibilities:
- Lead Hadoop upgrade projects for Confidential to create Portal with enhanced functionality using Best Practice.
- Created Governance Documentation for Hadoop Administration sites post migration to manage metadata
- Replaced default Derby metadata storage system for Hive with MySQL system.
- Developed Hive queries for teh analysts.
- Managed Hadoop log files.
- Analyzed teh web log data using teh HiveQL.
- Provided Hadoop training regarding installation configuration and rack awareness.
- Supported code/design analysis, strategy development and project planning.
- Shared responsibility for administration of Hadoop, Hive and Pig
- Created Design documents and provided Expertise is Azure Proxy ADFS sync.
- Providing recommendations on best practice functionality, translating business for various departments.
- Translating requirements into system based functionality, and planning and executing projects from inception to implementation.
- Provided Hadoop ecosystem and mentorship to various Business Analysts and Infrastructure team.
- Created Content migration POC from Open Text Live link to portal.
- Used various legacy Java APIs and classes within new Scala/Spark functions.
- Exporting a HDFS file format file on customer data from HDFS to MySQL.
- Created Various Oozie jobs in Hortonworks distribution.
- Test Internal Claims/Kerberos based autantication.
- Upgraded and migrated 100 GB of content with site collection artifacts to new intranet
- Provisioned and administered sites using PowerShell scripts for site creation, restore, import/export and backup as an administrator.
- Used Publishing Best Practices to create publishing template ad artifacts
- Organized content to optimize query performance.
- Maintain current industry noledge of development concepts, best practices and procedures for 2013/2016 solutions.
Confidential
Technical Consultant
Responsibilities:
- Lead teh upgrade projects for LSUC looking to create a Data and extranet application with enhanced functionality using Best Practice.
- Created autantication for Hadoop application using best practice.
- Created Governance Plan for Hadoop Ecosystem and its applications.
- Installation and configuration, migration of server setup, farm setup, database setup along with datasets from various sources
- Extracted data from RDBMS into HDFS and Hbase using Hcatalong functionality Cleaned teh data using Pig scripts and stored it in various locations.
- Created Java application for running scripts and queries to do ETL on Data lakes and data warehouses
- Used PowerShell commands to perform all administration related activities such as Site creation, backup and restore to other site collection. Change any settings with admin portal and update SP portal.
- Monitored ULS and Timer Job logs to see any issues and discrepancy from administration perspective.
- Performed troubleshooting activities using SP logs and best practices.
- Provide technical guidance to teh team in teh technologies related to teh development and support of an Enterprise SharePoint environment
- Used JQuery, JavaScript, Knockout.js to customize functionality on teh site