We provide IT Staff Augmentation Services!

Big Data Solution Designer Resume

2.00/5 (Submit Your Rating)

SUMMARY

  • 10+ years of experience in Designing and Implementing Customer Intranet and Extranet Portal solutions for various clients in different sectors.
  • More than Four years of experience in Hadoop Development/Administration built on six years of experience in Java Application Development.
  • Good noledge and SME of Hadoop ecosystem.
  • Experienced on working with Big Data and Hadoop File System (HDFS).
  • Three years experience installing, configuring, testing Hadoop ecosystem components.
  • Hands on Experience in working with ecosystems like Hive, Pig, Sqoop, Map Reduce, Flume, OoZie.
  • Strong Knowledge of Hadoop and Hive and Hive's analytical functions.
  • Capturing data from existing databases dat provide SQL interfaces using Sqoop.
  • Efficient in building hive, pig and map Reduce scripts.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (me.e, MYSQL, Cassandra) to Hadoop.
  • Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, HBase
  • Loaded teh dataset into Hive for ETL Operation.
  • Good noledge on Hadoop Cluster architecture and monitoring teh cluster.
  • Experience in using DBvisualizer, Zoo keeper and Cloudera Manager
  • Worked in different areas including Collaboration, Document Management, Portal and Web Content Management.
  • Expert in SharePoint Governance and used Best Practices for numerous client requirements.
  • Implemented Taxonomy and Folksonomy for clients
  • Well rounded experience in teh full software life cycle development.
  • Excellent analytical, problem solving, and communication skills.
  • Development and Designing skills for multi - tiered web applications in Developing and Implementing Web Applications using VB.NET and Visual Studio .NET and Brainbench Certified for ASP.NET
  • Microsoft Certified Professional
  • Certified Enterprise Architect from University of Toronto.
  • Certified Hadoop Developer and Administrator.

TECHNICAL SKILLS

Programming Languages: .NET, VB.NET, C#.NET, ASP.NET, JAVA, J2EE, C++, C, Visual Basic 6.0, Python, R

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Giraffe, Tableau, SAS.

UNIX Tools: Apache, Yum, RPM

Web Technologies: ASP.NET, SharePoint Apps, AJAX, jQuery, InfoPath 2010, JSP, PHP, XML, XSLT, XPath, VBScript, JavaScript, and Unified Access Gateway, BHOLD, SPML, ADFS, UAG, Bash

Database Technologies: SQL Server 2005/2008, SQL Server Reporting Services, Forefront Identity Management 2010, Report Builder 3.0, Crystal Reports, Oracle 8i/9i/10g, DB2, MySQL, Cassandra, MongoDB

Operating Systems: Linux AS3.0, Solaris, WINDOWS NT/2000/XP/2003

Enterprise Applications: Forefront Identity Management 2010, Unified Access Gateway, Documentum, MQ Series, Siebel, eRoom, MS Office 2003/2007, MS FrontPage 2003

Web Services: SOAP, DAML-S, BPEL4WS, FIM Web Service, Kerberos

Source Control Systems: Visual Source Safe, Subversion, PVCS, ClearCase, Team Foundation Server.

Application Server: Tomcat, BEA Weblogic, IBM Web Sphere, MKS

PROFESSIONAL EXPERIENCE

Confidential

Big Data Solution Designer

Responsibilities:

  • Leverage Hadoop and Cloudera cluster to Perform large scale data extraction and ingestion into data Lake
  • Regression techniques and Reinforcement Learning techniques used on projects for predictive analytics
  • Neural Networks use cases done using CNN and RNN datasets.
  • Used Python/PySpark and NoSQL database to create modelling and prediction for teh data.
  • Converted SAS POC code into Python/PySpark for data analysis on legacy data
  • Predictive modeling and feature selection on teh large datasets in HDFS
  • Created Sentiment analysis using Chatbots in Scala and Python,
  • Developed a Scala project to profile pdsium entities and flow them into HS+DFS data lake
  • Analyze Hadoop clusters using Big Data ecosystem and its tools including Hive, Spark and MapReduce
  • Conduct in-depth research on Hive to analyze partitioned and bucketed data
  • Leveraged Sqoop to import data from RDBMS into HDFS
  • Collaborated on Java projects to bring data from various sources into Hive/HDFS.
  • Used CI/CD and Jenkins for development and Agile methodologies on teh projects
  • Developed ETL framework using Data Warehousing and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
  • Performed cleaning and filtering on imported data using Hive and MapReduce.
  • Exporting a HDFS parquet file on customer data from HDFS to MySQL.
  • Analyzed teh Web/Application real time log data and flow to HDFS using Kafka/Spark Streaming.

Confidential

Responsibilities:

  • Regression techniques and Reinforcement Learning techniques used on projects for predictive analytics
  • Neural Networks use cases done using CNN and RNN datasets.
  • Used Keras and Tensorflow to model Deep learning datasets and predict results for various financial and non-financial projects.
  • Created Option plan financing simulation and prediction modelling using scikit learn, pyspark, hadoop and supervised learning
  • Created ML pipelines for analytics on datasets using Regression and classification models such as Logistic regression, K means, Random Forest.
  • Used R, SAS and Python for model creation, visualization, data cleaning.
  • Created Machine Learning Pipelines in R and Python to create multiple models for predictive analysis and save it in pickle.
  • Used Pyspark and No SQL database to create modelling and prediction for teh data.
  • Used Excel Modeling and Simulation using @RISK and Excel Precision Tree to create Decision Tree in order to solve multivariate Problem.

Confidential

Big Data and IoT Engineer

Responsibilities:

  • Leverage Hadoop and HDP to Perform large scale data ingestion and data analysis
  • Regression Analysis and Reinforcement Learning techniques used on projects
  • Used Excel Modeling and Simulation using @RISK and Excel Precision Tree to create Decision Tree in order to solve multivariate Problem.
  • Neural Networks POC done for teh client.
  • Created ML pipelines in R to perform data cleaning and visualization and used Python MLib in Anaconda to create Machine Learning models and analysis.
  • Derived teh logic using SAS analysis is used to create new SQL metadata structures on HDFS
  • Converted SAS POC code into R and pyspark for data analysis on legacy data
  • Predictive modeling and feature selection on teh large datasets in HDFS using pyspark
  • Worked on multiple Data extraction and ingestion use cases.
  • Worked in Shell Scripting and performed daily job troubleshooting and monitoring
  • Good expertise in reporting toolBASE-SAS 9.1.3
  • Analyze Hadoop clusters using big data analytic tools including Hive, Spark and MapReduce
  • Conduct in-depth research on Hive to analyze partitioned and bucketed data
  • Leveraged Sqoop to import data from RDBMS into HDFS
  • Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
  • Performed cleaning and filtering on imported data using Hive and MapReduce
  • Exporting a HDFS parquet file on customer data from HDFS to MySQL.
  • Extensive experience in data ingestion technologies such as Flume, Kafka
  • Used SSH to connect to various environments in both Dev and production using password less SSH technique.
  • Imported multiple SQL/No SQL database tables into HDFS with default delimiters using non-default file formats
  • Employ security using an autantication and authorization system such as Kerberos.
  • Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
  • Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
  • Created Hive queries along with oozie workflows using Shell Script and Jenkins
  • Created Visualizations in MicroStrategy in teh forms of enterprise dashboards and reports
  • Created SQL Queries in MicroStrategy to filter our and optimize results for better datasets
  • Written Hive UDFS to extract data from SQL databases as well other RDBMS databases.
  • Created test cases during two-week sprints using agile methodology.
  • Used CI/CD with Git to run automated workflows for code deployment to all platforms
  • Designed data visualization to present current impact and growth.
  • Provided Hadoop training regarding installation configuration and rack awareness.
  • Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
  • Organized content to optimize query performance.
  • Analyzed teh Web/Application log data and flow to HDFS using Kafka.

Confidential

Senior Data Analyst/Data Scientist

Responsibilities:

  • Leverage Hadoop and HDP to analyze massive amounts of call data and identify and improve teh churn rates for teh clients
  • Analyze Hadoop clusters using big data analytic tools including Pig, Hive, and MapReduce
  • Conduct in-depth research on Hive to analyze partitioned and bucketed data
  • Developed NIFI to automate teh loading of data into HDFS and Hive for data pre-processing
  • Architected 10-node Hadoop clusters with HortonWorks
  • Successfully implemented HWX on a 15-node cluster in AWS
  • Used Logistic and other ML algorithms to provide churn prediction results and improve performance by 0.5 to 1 % churn.
  • Worked in Shell Scripting and performed daily job troubleshooting and monitoring
  • Worked on various data extraction and ingestion use cases for multiple projects.
  • Leveraged Sqoop to import data from RDBMS into HDFS
  • Experience in utilizing SAS Functions, Procedures, Macros, and other SAS application for data updates, data cleansing, and reporting.
  • Skilled in merging SAS datasets, preparing data, producing and validating reports, SAS formats, and managing data.
  • Migrated existing SAS ETL and analytics code to teh Spark.
  • Developed ETL framework using Python and Hive (including daily runs, error handling, and logging) to glean useful data and improve vendor negotiations
  • Performed cleaning and filtering on imported data using Hive and MapReduce
  • Converted Python Noteboks in Jupyter to Pysaprk Notebooks using Pandas and other libraries
  • Used R to create Machine Learning piplines to data clean visualize data and
  • Exporting a HDFS parquet file on customer data from HDFS to MySQL.
  • Extensive experience in data ingestion technologies such as Flume, Kafka and NiFi,
  • Used Kafka, NiFi and Flume to gain real time and near real time data in HDFS and HBase from 3rd party sources and Data Lakes.
  • Using teh scala build tool (sbt) with spark to package applications.
  • Used SSH to connect to various environments in both Dev and production using password less SSH technique.
  • Imported multiple No SQL database tables into HDFS with default delimiters using non-default file formats
  • Employ security using an autantication and authorization system such as Kerberos.
  • Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
  • Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
  • Created Hive queries along with oozie workflows using Java application using HQL libraries.
  • Written Hive UDFS to extract data from NoSQL databases as well other RDBMS databases using HQL.
  • Created test cases during two week sprints using agile methodology.
  • Designed data visualization to present current impact and growth.
  • Developed Natural Language Processing and Machine Learning Systems using Python/C++.
  • Created MapReduce program in Python and ran on Centos using shell.
  • Provided Hadoop training regarding installation configuration and rack awareness.
  • Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
  • Involved in teh regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Created AWS ER and EC2 cluster to mimic On Premise HWX clusters
  • Crated Roles and Users in AWS using Dashboard
  • Migrated 50 TB of data use AWS Direct Conenct to AWS buckets and Redshift on AWS.
  • Fix any stale configurations on teh cluster and Cloud setup.
  • Organized content to optimize query performance.
  • Analyzed teh web log data.

Confidential

Senior Big Data Engineer/Consultant

Responsibilities:

  • Providing Hadoop ecosystem guidance and topology. Provided feature and upgrade best practices and SME on product use, customizations and maintenance
  • Installing and upgrading Apache Hadoop ecosystem on a cluster using Ambari and command line.
  • Configuring and memory tuning Apache Spark/Scala Jobs within HortonWorks.
  • Regression Analysis and Reinforcement Learning techniques used on projects
  • Used Incremental import technique to import various kind of data formats,
  • Used various legacy Java APIs and classes within new Scala/Spark functions.
  • Exporting a HDFS parquet file on customer data from HDFS to MySQL.
  • Extensive experience in data ingestion technologies such as Flume, Kafka and NiFi,
  • Used Kafka, NiFi and Flume to gain real time and near real time data in HDFS and HBase from 3rd party sources and Data Lakes.
  • Created Various Oozie jobs in HortonWorks distribution and performed error handling in Oozie for teh same.
  • Troubleshoot various performance and Memory Management issues with Spark jobs.
  • Tuned teh Data structure and deserialized RDD storages. Using Kyro Serialization.
  • Using teh scala build tool (sbt) with spark to package applications.
  • Used SSH to connect to various environments in both Dev and production using password less SSH technique.
  • Imported multiple No SQL database tables into HDFS with default delimiters using non-default file formats
  • Employ security using an autantication and authorization system such as Kerberos.
  • Provided guidance and techniques for Performance Tuning of Cluster and High Availability.
  • Developed Spark functions, MapReduce programs to parse teh historical data, populate staging tables and HDFS, and store teh refined data in partitioned tables in teh EDW.
  • Created Hive queries along with oozie workflows using Java application
  • Written Hive UDFS to extract data from NoSQL databases as well other RDBMS databases.
  • Created test cases during two week sprints using agile methodology.
  • Designed data visualization to present current impact and growth.
  • Developed Natural Language Processing and Machine Learning Systems using Python/C++.
  • Created ML pipelines in R to perform data cleaning and visualization and used Python MLib in Jupyter/Spyder to create Machine Learning models and analysis.
  • Created MapReduce program in Python and ran on Centos using shell.
  • Provided Hadoop training regarding installation configuration and rack awareness.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Familiarized with job scheduling using Fair Scheduler so dat CPU time is well distributed amongst all teh jobs.
  • Involved in teh regular Hadoop Cluster maintenance such as patching security holes and updating system packages.
  • Fix any stale configurations on teh cluster and Cloud setup.
  • Organized content to optimize query performance.
  • Analyzed teh web log data.

Confidential

Hadoop Data Engineer/Administrator

Responsibilities:

  • Providing Hadoop ecosystem guidance and topology. Provided feature awareness and upgraded best practices and SME on product use, customizations and maintenance in HortonWorks.
  • Installed and configured MapReduce, HIVE and teh HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for teh BI team using Sqoop to export data into HDFS and Hive.
  • Replaced default Derby metadata storage system for Hive with MySQL system
  • Executed queries using Hive and developed Map-Reduce jobs to analyze data.
  • Developed Pig Latin scripts to extract teh data from teh web server output files to load into HDFS.
  • Developed teh Pig UDF's to preprocess teh data for analysis.
  • Developed Hive queries for teh analysts.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase NoSQL database and Sqoop.
  • Importing and exporting data in HDFS and Hive using Sqoop in Hortonworks.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed teh data.
  • Environment setup/configuration was done using Apache Hadoop by Cloudera Manager.
  • Involved in loading data from LINUX and UNIX file system to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Shell and Pig.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with teh infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Administrator for Pig, Hive and HBase installing updates, patches and upgrades.

Confidential

Technical Consultant

Responsibilities:

  • Lead Hadoop upgrade projects for Confidential to create Portal with enhanced functionality using Best Practice.
  • Created Governance Documentation for Hadoop Administration sites post migration to manage metadata
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Developed Hive queries for teh analysts.
  • Managed Hadoop log files.
  • Analyzed teh web log data using teh HiveQL.
  • Provided Hadoop training regarding installation configuration and rack awareness.
  • Supported code/design analysis, strategy development and project planning.
  • Shared responsibility for administration of Hadoop, Hive and Pig
  • Created Design documents and provided Expertise is Azure Proxy ADFS sync.
  • Providing recommendations on best practice functionality, translating business for various departments.
  • Translating requirements into system based functionality, and planning and executing projects from inception to implementation.
  • Provided Hadoop ecosystem and mentorship to various Business Analysts and Infrastructure team.
  • Created Content migration POC from Open Text Live link to portal.
  • Used various legacy Java APIs and classes within new Scala/Spark functions.
  • Exporting a HDFS file format file on customer data from HDFS to MySQL.
  • Created Various Oozie jobs in Hortonworks distribution.
  • Test Internal Claims/Kerberos based autantication.
  • Upgraded and migrated 100 GB of content with site collection artifacts to new intranet
  • Provisioned and administered sites using PowerShell scripts for site creation, restore, import/export and backup as an administrator.
  • Used Publishing Best Practices to create publishing template ad artifacts
  • Organized content to optimize query performance.
  • Maintain current industry noledge of development concepts, best practices and procedures for 2013/2016 solutions.

Confidential

Technical Consultant

Responsibilities:

  • Lead teh upgrade projects for LSUC looking to create a Data and extranet application with enhanced functionality using Best Practice.
  • Created autantication for Hadoop application using best practice.
  • Created Governance Plan for Hadoop Ecosystem and its applications.
  • Installation and configuration, migration of server setup, farm setup, database setup along with datasets from various sources
  • Extracted data from RDBMS into HDFS and Hbase using Hcatalong functionality Cleaned teh data using Pig scripts and stored it in various locations.
  • Created Java application for running scripts and queries to do ETL on Data lakes and data warehouses
  • Used PowerShell commands to perform all administration related activities such as Site creation, backup and restore to other site collection. Change any settings with admin portal and update SP portal.
  • Monitored ULS and Timer Job logs to see any issues and discrepancy from administration perspective.
  • Performed troubleshooting activities using SP logs and best practices.
  • Provide technical guidance to teh team in teh technologies related to teh development and support of an Enterprise SharePoint environment
  • Used JQuery, JavaScript, Knockout.js to customize functionality on teh site

We'd love your feedback!