Technical Lead Resume
5.00/5 (Submit Your Rating)
San Jose, CA
SUMMARY
- Software professional having about 8 years of experience as a Cloud/Big Data/Python developer having experience in the Analysis, Design, Development, Testing and Implementation of data warehouse solutions for applications.
- Experienced in Snowflake (cloud - based) Data Warehousing and ETL processes. Strong database modeling, SQL, ETL and data analysis skills.
- In depth understanding/knowledge of Hadoop Architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce.
- In depth understanding of Apache spark job execution components like DAG, lineage graph, DAG Scheduler, Task scheduler, Stages and task.
- Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive and Pig.
- Worked on real-time, in-memory tools such as Spark, Spark SQL, PySpark, Impala and integration with BI Tools such as Tableau.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Experience in Python data analysis and modelling libraries such as Pandas, NumPy, SciPy, Matplotlib.
- Worked on R programming for forecasting data.
- Experienced in extending Hive and Pig core functionality by writing custom UDFs, UDAFs and Map Reduce Scripts using Java & Python.
- Experienced in NoSQL databases such as HBase and MongoDB.
- Exposure on Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
- Experienced in job workflow scheduling and monitoring tools like Oozie, Tidal Enterprise Scheduler, and Control-M.
- Experienced in developing applications using all Java/J2EE technologies like Core Java, Servlets, JSP, JDBC etc.
- Knowledge of administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig.
- Experience with developing large-scale distributed applications.
- Experience in developing solutions to analyze large data sets efficiently.
- Experience in Data Warehousing and ETL processes.
- Knowledge of Star Schema Modeling, and Snowflake modeling, FACT and Dimensions tables, physical and logical modeling.
- Strong database modeling, SQL, ETL and data analysis skills.
- Have Knowledge on Amazon Web Services and Google Cloud Platform.
- Good understanding of Data Mining and Machine Learning techniques.
- Strong knowledge on POSIX permissions and Hadoop Sentry access.
- Strong communication and analytical skills with very good experience in programming & problem solving.
- Ability to learn and adapt quickly and to correctly apply new tools and technology.
TECHNICAL SKILLS
- Hadoop
- HDFS
- Map Reduce
- Hive
- Pig
- Sqoop
- Flume
- Kafka
- Oozie
- Zookeeper.
- Snowflake
- Google Cloud Platform (GCP).
- Spark
- Spark SQL
- PySpark.
- MongoDB
- HBase.
- Python
- Pandas
- Requests
- JSON
- NumPy
- SciPy
- MatplotLib.
- Jupyter Notebook
- PyCharm
- Eclipse.
- JAVA
- J2EE
- Spring
- Hibernate
- Servlets
- JSP.
- Teradata
- MS SQL Server
- Oracle
- MYSQL
- Informatica
- Datastage.
- Splunk
- Tableau.
- Git
- SVN
- CVS
- Github
- GitLab
- Bitbucket
- JIRA.
- Jenkins
- Ansible.
- Linux
- UNIX
- Windows
PROFESSIONAL EXPERIENCE
Confidential, San Jose, CA
Technical Lead
Responsibilities:
- Responsible for business requirements gathering, system requirement analysis, cross-functional coordination and communication, development, testing, test cases creation, review & business testing acceptance support and deployment.
- Implemented Snowflake migration strategy, roadmap and proof of concept from Hadoop to Snowflake Cloud.
- Build the Logical and Physical data model for snowflake as per the changes required
- Worked on creating SnowSQL queries and Snowpipe for continuous data load.
- Developed ETL pipelines in and out of the data warehouse using a combination of Python and Snowflake’s SnowSQL.
- Used Snowflake Clone, Share, and Time Travel for testing.
- Worked on Temporary and Transient tables on different datasets as per the needs in Snowflake.
- Develop Python logic to import data from multiple sources if the requirement is to get data from multiple sources into one structure/table.
- Develop Hive scripts to join multiple tables and create an aggregate table as per the requirement and leverage build in Hive window and analytical functions.
- Develop PySpark logic to transform the data into the aggregate table if the source tables are huge in terms of volume and in need of in-memory processing.
- Develop shell scripts to call Splunk server APIs using the curl command to process log/unstructured/semi-structured data and store them into a target table.
- Develop ETL (Extract, Transformation, and Load) Monitoring Framework. Analyze job specifics and recommend before the chronological jobs are committed, thereby saving time and effort. This will help optimize the ETL process and reduce job failures, resulting in more stable systems.
- Implementing enhancements in development Procedures leveraging Spark tools like Spark and Spark-SQL for dealing with large sets of data, retrospective analysis of process control techniques, use of advanced asynchronous architectures for In-memory processing, better performance, and scalability.
- Develop MongoDB queries for CRUD (Create, read, update and delete) operations, aggregation framework and develop Mongo lookup queries to join multiple collections as per the requirement.
- Implement REST APIs using Node.js and Express.js with MongoDB as a data layer for complex logic reporting.
- Create Control-M jobs in the development environment as per the business requirement refresh frequency and migrate to Stage and Production.
- Design and build best-of-class production processes that ensure security, efficiency, and maximum availability of the application and data.
- Document each step of the development to ensure adequate communication within the team and customers.
- Work with the offshore team and share design logic with them for the implementation of requirements.
- Review of the developed system from Business User(s) before production implementation.
- Analyze the applications for performance issues and design and develop solutions to improve performance.
- Participate in Code deployment on various environments (Development, Stage, and Production).
Confidential, San Jose, CA
Hadoop/Spark Technical Lead
Responsibilities:
- Responsible for gathering requirements, analysis, design, development, testing and deployment.
- Involved in creating data-lake by extracting customer's data from various data sources to HDFS which includes data from Excel, RDBMS, MongoDB, APIs and log data from Splunk servers.
- Developed automated scripts for ingesting the data from Oracle, Teradata and MySQL Daily, bi-weekly refreshment of data using Sqoop and Python.
- Created Sqoop jobs with incremental load to populate Hive external tables.
- Worked with highly unstructured and semi structured data.
- Developed Pig, Hive, PySpark scripts to transform raw data from several data sources in to forming baseline data.
- Solved performance issues in Hive and Pig scripts with understanding of joins, group and aggregation and how does it translate to MapReduce jobs.
- Worked in tuning Hive and Pig scripts to improve performance.
- Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries.
- Worked on Spark RDD for data pre-processing.
- Developed Spark and Spark-SQL code for data quality checks and faster processing of data.
- Written Python utility functions to extract data from varies databases such as Oracle, MySQL and MongoDB etc.
- Worked on Python scripts to parse JSON, XML documents and load the data in Hive tables.
- Worked on loading CSV, TXT, AVRO, PARQUET files using Python language in Spark Framework and process the data by creating Spark Data frame and RDD and save the file in parquet format in HDFS to load into fact tables.
- Migrated HiveQL queries on structured data into Spark SQL to improve the performance.
- Worked on Splunk REST APIs to load log data from Splunk servers to HDFS.
- Worked on Splunk queries for data validation and quality checks.
- Loaded, processed, aggregated data in MongoDB using Hive-Mongo integration for reporting.
- Written Mongo aggregation queries to visualize the data in reporting layer.
- Written R script for forecasting future data using Hive tables.
- Worked on UNIX/LINUX shell scripting.
- Developed TES workflow for scheduling and orchestrating the ETL process.
- Written MFS commands to maintain Hive table level security.
- Worked on JAVA and J2EE applications.
- Worked with Spring framework for implementing web services, integrating other frameworks easily and maintaining code consistency.
- Used Hibernate in DAO layer to access and update information in Oracle database, developed Hibernate configuration files (hbm.xml) for object relational mapping with database, fine tuned performance by optimizing query and data caching mechanisms.
- Integration and delivery using Jenkins, GIT and BitBucket.
- Actively participating in the code reviews, meetings and solving any technical issues.
Confidential, San Jose, CA
Hadoop/Spark Developer
Responsibilities:
- Part of IT Development team involved in ingesting data from external sources.
- Developed automated scripts for ingesting data from Oracle, Teradata and MySQL Daily, bi-weekly refreshment of data using Sqoop.
- Worked with highly unstructured and semi structured data of 50 TB in size.
- Written Pig scripts to transform raw data from several data sources in to forming baseline data.
- Developed Hive scripts for end user / analyst requirements for ad-hoc analysis.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Written R script for forecasting future data using Hive tables.
- Experienced in querying data using Spark SQL on top of Spark engine.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Worked in tuning Hive and Pig scripts to improve performance.
- Developed UDF and UDAFs using JAVA as and when necessary to use in PIG and HIVE queries.
- Extracted the data from Teradata into HDFS using Sqoop.
- Created Sqoop job with incremental load to populate Hive External tables.
- Developed TES workflow for scheduling and orchestrating the ETL process.
- Exported Hive tables to MongoDB application.
- Written MFS commands to provide Hive table level access permissions.
- Involved in gathering business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Actively participating in the code reviews, meetings and solving any technical issues.
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Involved in all phases of Software Development Life Cycle (SDLC) and worked on all activities related to the development, implementation and support for Hadoop.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Sqoop, Hive and Pig.
- Created Hive tables and implemented incremental imports to perform ad-hoc queries on structured data.
- Analyzed data and wrote complex Hive queries to meet the business requirements.
- Built reusable Hive UDF libraries written in Java for business requirements which enabled users to use these UDF’s in Hive queries.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Loaded CSV files, Pipe delimited files into HDFS and Hive tables.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Python.
- Developed Map Reduce programs using Java to perform various ETL, cleaning and scrubbing tasks.
- Experienced in managing and reviewing Hadoop log files.
- Involved in designing architecture of Hive databases and workflows.
- Exported Hive table data to Oracle database using Sqoop export.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Adept at understanding Partitions, bucketing concepts in Hive to optimize performance.
- Developed TES workflow for scheduling and orchestrating the ETL process.
- Created Unix Shell scripts to audit Oracle, Teradata and Hive table counts using Sqoop.
- Involved in loading data from UNIX file system and SFTP server to HDFS.
- Used Impala and written queries for fetching data from Hive tables.
- Actively involved in code review and bug fixing for improving performance.
Confidential, San Jose, CA
Hadoop Developer
Responsibilities:
- Developing automated scripts for ingesting the data from Teradata around 200TB bi-weekly refreshment of data.
- Working with highly unstructured and semi structured data of 50 TB in size.
- Creating tables with Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Reviewed performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation.
- Developing Hive and Pig scripts to improve performance.
- Developed UDFs using JAVA as and when necessary to use in PIG and HIVE queries.
- Extracted the data from Teradata into HDFS using Sqoop.
- Participated in architecture design meetings in order to understand the architectural framework.
- Object-oriented software development methodologies and practices.
- Deploying applications to J2EE application servers in Linux operating systems.
- Implemented MVC architecture using Struts according to J2EE standards.
- Used Eclipse as the IDE and Eclipse plug-in to integrate the Tomcat Server, Enterprise class solutions such as Application Servers, JAVA, J2EE, Spring, Hibernate, JBoss, XML, Web Services, and others.
- Reviewed bugs with QA Team to determine defect validly.
- Responsible for creating design documents outlining requirement specifications interacting with Business users and business analyst to design prototypes.
Confidential
Java Developer
Responsibilities:
- Analyzing the business requirements and doing the GAP analysis then transforming them to detailed design specifications.
- Involved in design process using UML & RUP (Rational Unified Process).
- Performed Code Reviews and responsible for Design, Code and Test signoff.
- Assisting the team in development, clarifying on design issues and fixing the issues.
- Involved in designing test plans, test cases and overall Unit and Integration testing of system.
- Development of the logic for the Business tier using Session Beans (Stateful and Stateless).
- Developed Web Services using JAX-RPC, JAXP, WSDL, SOAP, XML to provide facility to obtain quote, receive updates to the quote, customer information, status updates and confirmations.
- Extensively used SQL queries, PL/SQL stored procedures & triggers in data retrieval and updating of information in the Oracle database using JDBC.
- Expert in writing, configuring and maintaining the Hibernate configuration files and writing and updating Hibernate mapping files for each Java object to be persisted.
- Expert in writing Hibernate Query Language (HQL) and Tuning the hibernate queries for better performance.
- Used the design patterns such as Session Façade, Command, Adapter, Business Delegate, Data Access Object, Value Object and Transfer Object.
- Deployed the application in Weblogic and used Weblogic Workshop for development and testing.
- Involved in application performance tuning (code refractory).
- Writing test cases using JUNIT, doing test first development.
- Used Rational Clear Case & PVCS for source control. Also used Clear Quest for defect management.
- Writing build files using ANT. Used Maven in conjunction with ANT to manage build files.
- Running the nightly builds to deploy the application on different servers.