Big Data Hadoop Architech Resume
3.00/5 (Submit Your Rating)
New York, NY
SUMMARY
- Over 15 + years of IT experience in analysis, design and development using Big Data, Hadoop, HDFS, MapReduce and Hadoop Ecosystem(Pig, Hive, Impala and Spark, Scala), Java and J2EE.
- Experienced on Big Data in implementing end - to-end Hadoop solutions and analytics using various Hadoop distributions like Cloudera Distribution of Hadoop(CDH 5.5), Hortonworks sandbox(HDP) and MapR distribution.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig and Solr, Splunk.
- Extensive experience in Java and J2EE technologies like Servlets, JSP, JSF, JDBC, JavaScript, ExtJS,hibernate, and Junit testing.
- Expertise in using J2EE application servers such as IBMWebSphere, JBoss and web servers like ApacheTomcat.
- Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR), Hadoop, Spark and effective use of map-reduce, SQL and Cassandra to solve big data type problems.
- Experienced in different Hadoop distributions like Cloudera (CDH3 & CDH4) and Horton Works Distributions (HDP).
- Experienced on Hadoop/Hive on AWS, using both EMR and nonEMR-Hadoop in EC2.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala. Solr, Git, Maven, AVRO, JSON and CHEF.
- Experienced in analyzing data using HIVEQL, PIG Latin and custom MapReduce programs in JAVA. Extending HIVE and PIG core functionality by using custom UDF's.
- Experienced in working with Hadoop architect and big data users to implement new Hadoop eco-system technologies to support multi-tenancy cluster.
- Experienced in configuring and administering the Hadoop Cluster using major HadoopDistributions like Apache Hadoop and Cloudera.
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, J2EE, EJB, JSP, JavaServlets, Struts, and Java database Connectivity (JDBC) technologies.
- Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).
- Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, JSON and XML files.
- Excellent experience on using Sqoop to import data into HDFS from RDBMS and vice-versa. s
- Experienced on R and Python (pandas, numpy, scikit-learn) for statistical computing. Also experience with Mllib (Spark), Matlab, Excel, Minitab, SPSS, and SAS
- Experienced on ImplementingService Oriented Architecture (SOA) using Web Services and JMS(Java Messaging Service).
- Experienced in MVC (ModelViewController) architecture and various J2EE design patterns like singleton and factory design patterns.
- Extensive experience in loading and analyzing large datasets with Hadoop framework (MapReduce, HDFS, PIG, HIVE, Flume, Sqoop, SPARK, Impala, Scala), NoSQL databases like MongoDB, HBase, Cassandra.
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Excellent experienced on NoSOL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in the active duster environment.
- Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
- Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
TECHNICAL SKILLS
- Random Forrest
- Recommendation Engines
- Python
- H20.ai
- Scikit-learn
- Data Governance
- Metadata Management
- IBM WebSphere
- Java (Sun Certified)
- J2EE
- SQL
- SOAP and REST Web Services
- Architecture
- Infrastructure build
- DMZ design
- Workforce planning
- Client relationship
- Incident management & root cause analysis
- Spark
- Kafka
- Flume
- Hive
- Sqoop
- HBase
- Search
- Impala
- Cloudera CDH 5.3 5.7
- Actuate
- UNIX
- DB2 analytical platform
- DB2
- Oracle: Design and Development
- Single Sign-On using SAML Assertion
- Java batch
- DB2: Optimization and capacity planning
PROFESSIONAL EXPERIENCE
Confidential, New York, NY
Big Data Hadoop Architech
Responsibilities:
- Implemented Random Forest and Gradient Boosting Machine Learning models to predict whether a corporate equity client participant is a reinvestment opportunity for the firm or not. In this proof of concept (POC), the model was implemented using H20.ai (an Open Source Machine Learning platform) and Python. The data was gathered from data sources of various applications, merged, cleaned, and stored in Hadoop (Hive) for further analysis. The Financial Advisors, users of this predictive analytic engine, will be able to do more focused marketing campaigns.
- Led the effort to migrate Actuate reporting platform to Business Objects and selected a vendor after multiple presentations from four vendors. The migration to Business Objects will result in TCO reduction of $5 million over a period of few years.
- Enterprise Metadata Management and Data Governance: Made significant contributions towards Data Strategy for the CES group of applications. The requirement is to have one Data Hub for all applications instead of each application using their own database. The objective is to improve data quality, data standardization, reduce future development costs and provide new data analytics platform.
- As team Architect, led SAPPHIRE JV integration of Citigroup Smith Barney with Confidential . Responsible for all technical discussions/decisions for the SAPPHIRE application that included Actuate reporting plant, DB2 Data Migration, remediation of VA Scan findings, use of NAS Share vis-à-vis SAN for data storage, and DMZ design review with Enterprise Infrastructure and Security Architect.
- Led SAPPHIRE migration effort from IBM Data Center to Confidential Data Center under Enterprise Stack Adoption (ESA) program.
- Led DB2 access and Java processes optimization to handle high data volumes resulting from on-boarding of big clients like Google and Amazon.
- Led the effort to automate Client On-boarding/migration by providing the implementation team a Workflow Engine that guides through initial Client setup, files upload, Recon reports, Go-Live checklist.
- Managed delivery of major SAPPHIRE releases, with the biggest release of approximately 9 FTE.
- Instrumental in formation of first SAPPHIRE Level1 production support team.
- Technology stack: Hadoop (Spark, Kafka, Flume, Hive, Sqoop, HBase, Cloudera CDH 5.3), H2O.ai, Python, Java, IBM WebSphere Application Server 7.1, DB2 v11, IBM MQ 6.0, Actuate iServer v11, AutoSys for Job scheduling, J2EE, SOAP and REST Web Services, JIRA, Git, TeamCity