Big Data Architect Resume
SUMMARY:
- 15 years of Analysis, Design, application software development and administration with extensive exposure in various fields using Java 8, Scala 2.10 / 2.11, Python 2.7.
- Over 5 years of experience with Big data Technologies - Apache SPARK 1.6.2 / 2.0.2 , Hadoop, MapReduce, Kafka 0.10, HDFS, HBase 1.1.2, Hive 1.2.1, BigQuery, Zookeeper, Spring Data, Cassandra 2.1.12/3.0.0 , AWS.
- Good exposure to the design, development / support of Apache SPARK, Hadoop and Big data ecosystem using Apache Spark 2.0.2( SQL + DataFrames, Spark Streaming, MLlib, GraphX ), Infosphere Biginsights 4.1 (IBM’s Product), Cloudera CDH 5.9, Hortonworks HDP 2.6.3, MapR 5.0.
- Hands on experience with Spark, Kafka, MapReduce, Cassandra, HBase, Hive, NiFi, Scoop, Elasticsearch to analyze the data and generate results.
- Hands on experience with Machine Learning - H2O(Sparkling water), Spark MLlib ( spark.ml(DataFrames), spark.mllib(RDD) ).
- Hands on experience in installation, configuration, supporting and managing Hadoop Clusters using Apache open source products, Hortonworks HDP 2.6.3, Cloudera CDH 5.9, IBM BigInsights 4.3 and MapR 5.0.
- Good experience on Agile and Scrum methodologies.
- Good experience in UNIX/ Linux and shell scripts.
- Co-ordinate work requests among team members, sizing, work allocation, status reporting, defect tracking, change management, issues clarification.
- Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, mlib, Kafka, Flume, Map reduce, Hive etc.
- Extensively worked on AWS, EC2, S3 Bucket policies, Lambda scripts(Node.js, Python), AWS CloudFormation, AWS IAM.
COMPUTER PROFICIENCY:
Big Data Ecosystems: Apache Spark 1.6.2 / 2.0.2 ( Spark SQL and DataFrames, Spark Streaming, MLlib, GraphX ), HadoopMapReduce, HDFS, HBase, Zookeeper, Hive, Pig, Sqoop, Cassandra, Oozie, Flume, Pentaho.
Methodologies: RUP (Rational Unified Process), UML (Unified Modeling Language)
Patterns: GOF, J2EE Core, (Inversion of Control) IoC, DI (Dependency Injection)
Languages: Java 1.2/1.3/1.4/5/6/7 , C, C++, Perl 5.0/6.0.
JEE Technologies: JSP 2.3, Servlets 3.1, JDBC 4.0, EJB 3.2, Swing, JSF 2.2, JSTL 1.2, JNDI 1.2, JMS 1.1, Java Mail 1.4JAXP 1.3, JAAS 1.1, Web Services 1.3, JAX-WS 2.0, JAXB 2.0, SAAJ 1.3, StAX 1.0
Application Servers: WebSphere 6.0/6.1/7.0/8.08.5 , Web Logic 6.0/7.0/8.0, Tomcat 4.1.31/5.5.0 , JBoss 4.2, SAP NetWeaver Application Server (Web AS 6.20, Web AS 6.40).
Data Transformation: XML, XSD, SML Schema, XSL/XSLT, JAXP, SAX/DOM.
.NET Technologies: C#, VB.NET, ASP.NET, ADO.NET. ( MCSD Certified)
RDBMS: DB2 UDB 7.2/8.1, Oracle 8i/9i R2/10g, Microsoft SQL Server 7.0/2000/2003/2005 , MS -Access.
Scripting: ANT 1.8. Maven 2.0. sbt 0.13.
Operating Systems: Windows 8 / 7 /XP / 7, MS-DOS, UNIX (AIX) 5.3, Linux Ubuntu 14.04.3 LTS.
Web Services: SOAP and RESTful (WebSphere, .NET and SAP) Environments.
Source Code Mgmt: Perforce, PVCS, CVS, SVN (Subversion), Star Team Professional.
IDE: IntelliJIDEA 15, Eclipse Luna, RAD 6.0/7.0/7.5, RSA 8.0/8.5.5
PROFESSIONAL EXPERIENCE:
Big Data Architect
Confidential
Responsibilities:
- Involved in complete SDLC of project includes requirements gathering, design documents, development, testing and production environments. Packaged Spark development environment into custom vagrant box.
- Worked on Architecting and Designing of the entire SLOT application.
- Coded the real-time with Spark Streaming and Apache Nifi to Store the data in Hive and HBase.
- Worked on enabling the HBase installation, Phoenix.
- Lead the Elasticsearch 5.2.2, Logstash 5.2.2, Kibana 5.2.2 pipeline for the logs.
- Involved in complete SDLC of project includes requirements gathering, design documents, development, testing and production environments.
- Designed Cassandra Tables and Hive Tables.
- Designed the Audience Management Service ( read experiement metadata and validate )
- Extensively worked on H2O and Apache Spark 1.6.2 Machine Learning Libraries.
Environment: Spark 1.6.2 { Spark-Streaming, Spark-MLlib, Spark-GraphX, Spark-SQL, Spark-Data Frames}, Apache Nifi 1.3, Apache HBase 1.2.1, Apache Hive, H2O Sparking water 1.6.3, Hadooop Distributions ( Hortonworks HDP 2.5.3), Hadoop ( MapReduce, YARN, HDFS), Apache Kite, Akka 2.4.x, Scala 2.10 / 2.11, Python 2.7, Java 8, Kafka 0.10, Apache Zookeper, Apache Scoop 1.4.x, Oracle 12c Release 1 (12.1.0.2), DB2, Elasticsearch 5.2.2, Logstash 5.2.2, Kibana 5.2.2, Apache Cassandra 3.
Big Data Consultant
Confidential, Richmond, VA
Responsibilities:
- Involved in complete SDLC of project includes requirements gathering, design documents, development, testing and production environments.
- Involved in all the stages of the Data pipeline - Data Acquision, Parse, Storage, Transform / Explore, Vectorization, Train, model Expose and Presentation.
- ETL - Used Apache NiFi for file transfers to Hadoop. Used Sqoop and Kite for Table snapshots and incremental mirroring of data from Oracle and PostgreSQL. Handled Parquet, Avro, JSON and Text file formats. Used Google snappy compression.
- Extensively worked on AWS, EC2, S3 Bucket policies, Lambda scripts(Node.js, Python), AWS CloudFormation, AWS IAM.
Environment: Spark 1.6.2 { Spark-Streaming, Spark-MLlib, Spark-GraphX, Spark-SQL, Spark-Data Frames}, H2O Sparking water 1.6.3, Hadooop Distributions ( Cloudera CDH 5.7, Hortonworks HDP 2.4), Hadoop ( MapReduce, YARN, HDFS), Apache Kite, Akka 2.4.x, Scala 2.10, Python 2.7, Java 8, Kafka 0.8.2.x, Apache Zookeper, Apache Scoop 1.4.x, Oracle 12c Release 1 (12.1.0.2), PostgreSQL 9.4, Elasticsearch 2.3.0, Logstash 2.3.0, Kibana 4.5.0, Apache Cassandra 3.0.9, Chef, AWS, EC2, S3, Lambda scripts, AWS CloudFormation, AWS IAM, Git, VersionOne.
Hadoop Developer / Big Data Architect
Confidential
Responsibilities:
- Involved in complete SDLC of project includes requirements gathering, design documents, development, testing and production environments.
- Developed optimal strategies to collect the data from different sources (Internal, External, customer centric and product centric) and organize them into a stream of observable events.
- Involved in the building(design, implematation) of Data lake from ground up ( Data, Processsing Storage, Agility, Security).
- Analyzed the OLTP, Application Logs, ClickStream, Operations metrics to extract the necessary informatio from the raw data. Worked with Spark SQL and DataFrames API.
Environment: Spark 1.5.1 { Spark-Streaming, Spark-MLlib, Spark-GraphX, Spark-SQL, Spark-Data Frames}, Hadooop Distributions ( Infosphere BigInsights 4.1 (IBM’s Product ), Cloudera CDH 5.5, Hortonworks HDP 2.3), IBM DataStage 8.1 (Designer, Director, Administrator), Hadoop ( MapReduce, YARN, HDFS), Apache Kite, Apache NiFi, Akka 2.4.x, Scala 2.10, Java 8, Kafka 0.8.2.x, Apache Zookeper, Apache Flume 1.6.0, Apache Scoop 1.4.x, Elasticsearch 2.0, Tableau 9.0.
Hadoop Developer
Confidential, Chevychase, MD
Responsibilities:
- .End to End hadoop project execution for various clients worldwide in IBM platform.
- Worked on integration of flume with Kafka on PowerLinux for a auto Insurance client.
- Worked on PoCs on Apache Spark for streamig data.
- Design, coding, tesing and deployment of various hadoop solutions to IBM clients.
- Enabling different ISV workloads and hadoop ecosystems to PowerLinux eco system.
Environment: Hadooop Distributions (Infosphere Biginsights 3.0 (IBM’s Product), Cloudera CDH, IBM DataStage 8.0(Designer, Director, Administrator), Hadoop ( MapReduce, YARN, HDFS), Apache Spark 1.1, HBase, Zookeper, Hive, PowerLinux, Shell scripting, AIX, REST APIs, Java, JavaScript.
Java / Hadoop Developer
Confidential, Los Angeles, CA
Responsibilities:
- Load and transform large sets of structured, semi structured and unstructured data.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experience in analyzing data using hive queries, pig scripts and MapReduce programs.
- Experienced in managing and reviewing Hadoop log files.
Environment: Hadooop Distributions (Infosphere Biginsights 3.0 (IBM’s Product), Cloudera CDH, IBM DataStage 8.0 (Designer, Director, Administrator), HDFS, Pig, Sqoop, Hive, MapReduce, Cloudera, Java, Spring, Hibernate, REST, SOAP, WSDL, Websphere 8.5, Avro, Zookeeper, HBase, Cassandra, Oracle, Shell Scripting, Ubuntu, Flume, Tableau, Agile, Impala, Linux Red Hat.