Big Data Technical Lead Resume
Santa Clara, CaliforniA
SUMMARY:
- Over 11 years of professional IT experience including Specification, Design, Implementation, Debugging, Testing and Deployment of complex software applications using BIG DATA (Spark/Kafka/Hive/Hadoop),Scala, J2EE, QT/C++, . Having excellent Technical, Problem Solving, Client Interaction and Management Skills.
- 5 years of work experience as a Big Data Technical Lead and distributed computing with sound knowledge in Ingestion (Flume, Kafka, Sqoop), Storage (HDFS, HBase), Querying (Hive, Pig, SparkSql), Processing (Map - Reduce, Spark) and Machine Learning (Spark ML).
- Strong Experience in developing and setting up 24/7 high performing production streaming application using Spark Streaming and Kafka.
- Experience in Spark Core, Spark Sql, Spark Streaming, Data Frames, RDD's, Scala for Spark
- Experience in using DStreams, Accumulator variables, Broadcast variables, RDD caching for Spark Streaming
- Experience in writing queries for moving data from HDFS to Hive and analyzing data using Hive-QL
- Strong experience in Data analytics on big data using Scala, Python, MLib.
- Strong experience in Databases like MySQL, Redis (NoSQL) HBase, Cassandra.
- Experience in Data Visualization with MS-Excel, FineReport, Zeppelin.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in developing and implementing SOAP and RESTful Web Services that will integrate with multiple applications.
- Experience in using enterprise search frameworks like SOLR, Elastic Search for Log analysis.
- Experience in writing Shell Scripts, Map Reduce jobs, Spark jobs to perform ETL operations
- Experience in automating build (Maven, sbt) and development tasks using Shell Scripts, and Python.
- Strong experience and knowledge in Data mining, Text Mining, NLP, Recommendation Systems, Forecasting, Regression, Classification, Clustering and Statistical modeling.
- Very good Knowledge in Amazon AWS concepts like Lambda, ECS, EMR, EC2 and S3 web services which provides fast and efficient processing of Big Data.
- Experience in middle-tier development using J2EE technologies like JSP, Servlets, EJB, JDBC, JPA, JMS, Struts, Spring, JAXB, JAX-WS and JAX-RS.
- Experience in collaborating with different overseas teams (USA, Europe, Australia and China) to meet delivery deadlines.
- Strong experience in Programing languages C, C++, QT .
TECHNICAL SKILLS:
Programming Languages: Java, Scala, Python, C++, C
Big Data Platform Distribution: Confidential Fusion Insight, Cloudera, HDP
Hadoop Eco-system: Pig, Hive, HBase, Sqoop, Flume, Zookeeper, Oozie, Hue, Yarn
Big Data Frameworks: Hadoop, Solr, Spark, Kafka
J2EE Technologies: Servlets, JSP, JSTL, EJB, JDBC, JMS, JNDI, RMI, JAX-WS, JAX-RS, Log4J
Cloud Services: Amazon Web Services, Confidential Cloud
Application Security: OAUTH, Kerberos
Interoperability: SOAP and Restful Web Services
Scripting Languages: Pig, Perl,ShellScript, sed
Data mining/ML Tools: Mahout, Python, MLlib.
Reporting and Visualization Tools: MsExcel, Zeppelin, FineReport
Schedulers: Oozie, TCC ( Confidential Internal Tool)
Message Queues/Brokers: Kafka
Packaging/Deployment: Maven, Gradle, Ant
Version Control tools: Tortoise SVN, Git, GitHub
NoSQL Databases: Column Oriented Databases: HBase, Cassandra
Relational Databases: MySQL, SQLITE, Confidential MPP DB
Application/Web Servers: Tomcat
Search Frameworks: Elastic Search, Solr, Lucene
Operating Systems: Windows 7/8, LINUX, UBUNTU, Android, MAC OS X
IDE Tools: IntelliJ IDEA, Eclipse, PyCharm, Scala IDE, QTCreator, Visual Studio
Testing Frameworks/Tools: Junit, MRUnit, JMeter
Desktop Application Framework: QT, Visual Studio
Web Designing Tools: JSP
DevOps Tools: Jenkins, Git, Ansible
Agile Project Management, Quality Processes and Defect Tracking Tools: JIRA, Rally, Six Sigma Green Belt, Confidential DTS Tool
Installer Tools: NSIS, MSI
PROFESSIONAL EXPERIENCE:
Confidential, Santa Clara, California
Big Data Technical LeadLanguages & Tools: Scala, Java / J2EE, Flume, Kafka, Spark SQL, Spark Streaming, Spark ML, Hive, Hadoop, Web services, Tomcat, Maven, Confidential Fusion Insight(Customized Cloudera / HDP)
Responsibilities:
- Designed Star Schema based Data Lakes/Data Stores based on ingested data from Confidential channel networks like Vmall, Music, Video, App Store, Ad SDK and 3rd party data.
- Developed data partitions for the data ingestion to improve data processing speed.
- Flume deployment to efficiently collect the real-time data.
- Processed multiple kafka topic using Scala.
- Processed Real time data using Spark streaming (DStreams) and Scala.
- Analyzed data and developed data model using Spark / Scala
- Implemented Hive scripts for analyzing batch data for Tag and Audience generation.
- Developed web services to provide API’s for DSP / SSP systems.
- Implemented machine learning algorithms to efficiently tag the meta data and generate the audience profiles
- Implemented shell scripts data export data from Hive to MySql for DSP / SSP platform queries.
Confidential
Big Data Technical LeadLanguages & Tools: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Python, HBASE, OOZIE, yarn, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, MySQL, Linux, AWS, XML, SVN, Putty Spark Scala
Responsibilities:
- Wrote the Map reduce job to write the Full history Public sourcing documents (3 billion documents) to hbase and Elastic Search.
- Loaded history Data as well as incremental customer and other data to Hadoop through Hive.Applied the required Business logic to the data in hive and generated the required output in the form of Flat file.
- Wrote the Json API to retrieve the public sourcing documents from Elastic Search.
- Worked on building Big Data infrastructure stack include Elastic Search, Hadoop Stack (HDFS, Map Reduce .HBase, Zoo keeper), and private data cloud.
- Processed multiple kafka topic using Scala.
- Processed Real time data using Spark streaming (DStreams) and Scala.
- Analyzed data and developed data model using Spark / Scala
- Involved in data modeling in Hadoop.
- Processed multiple kafka topic using Scala.
- Implemented Kafa Producer application .
- Develop oozie schedules for batch execution
- Develop shell script to calculate the parameters needed for oozie flow and start the batch execution
- Moving data from Hadoop/Hive to Elasticsearch
Confidential
Big Data Technical LeadLanguages & Tools: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, Python, HBASE, OOZIE, yarn, Spark, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, Mainframe, MySQL, Linux, AWS, XML, CRM, SVN, PDSH, Putty, BigInsights
Responsibilities:
- Create the project using HIVE, BIGSQL, PIG
- Implemented Partitioning, Bucketing in HIVE.
- Involved in data modeling in Hadoop.
- Creating Hive tables and working on them using Hiveql.
- Written Apache PIG scripts to process the HDFS data.
- Created Java UDFs in PIG and HIVE.
- Designed end to end ETL work flow using Hadoop.
- Involved in data modeling in Hadoop.
- Participated in backup and recovery of Hadoop file system.
- Automated tasks using UNIX shell scripts.
- Requirement Analysis & Prepares solutions for each requirement
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
Confidential
Big Data Technical LeadLanguages & Tools: Java, HDFS, Map Reduce, Hive, Confidential Fusion Insight (Customized Cloudera / HDP), MS-Excel, MySql, Fine Reports
Responsibilities:
- The platform supports batch data processing on the production cluster which is a 320 node Cluster with a daily data ingest size of 35 TB processing 200 billion records on a typical day Responsible for building scalable distributed data solutions using Hadoop.
- Installed and configured Hive, HDFS, Zookeper, and MapReduce on the Hadoop cluster.
- Developed tool to securely transfer batch data from different application servers to HDFS for analysis.
- Implemented reports using MS Excel, Fine Report Tool. Processed JSON, CSV, parquet data formats.
- Implemented Hive scripts and MapReduce programs based on the business requirements to process different system data. Implemented custom UDF to support business and meet the data security requirements.
- Implemented Scripts to export data from MySql (Application Sever) - Hive - MySQL (Business Servers). Implemented scheduler framework to schedule the Batch scripts for daily data ingestion, processing in production environment.
Confidential
Big Data Technical LeadLanguages & Tools: C++, QT, MySQL
Responsibilities:
- Design, Coding, Testing.
- QT/C++ for showing the different status of the device detection in data cards (modem)
- Training new members in team.
- Acted as module lead for multiple modules in the team.
- Custom QT Compilation to reduce the lib size.
- CI maintenance for the agile development. Automating the Build and release procedure.
Confidential
Big Data Technical LeadLanguages & Tools C++, QT, MySQL
Responsibilities:
- Involved in Design of GUI.
- Involved in development of GIS based desktop application using GIS libraries, C++, QT4 integrated with MSVS2005 and MYSQL.
Confidential
Big Data Technical LeadLanguages & Tools: C++, QT, MySQL
Responsibilities:
- Involved in Design of GUI.
- Involved in development of GUI and core application using C++, QT4 integrated with MSVS2005.
- Maintaining Database using SQLite to store the entire configured data and plotting the data acquired on RS232