Data Science Resume
San Francisco, CA
SUMMARY:
- 9+ years of experience in a various IT related technologies, which includes 4 years of hands - on experience in Big Data technologies.
- Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, Map Reduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
- Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, AWS, Spark, Storm, Kafka, Oozie, and Zookeeper .
- Strong comprehension of Hadoop daemons and Map-Reduce topics.
- Used Informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases .
- Experienced in developing UDFs for Pig and Hive using Java.
- Strong knowledge of Spark for handling large data processing in the streaming process along with Scala .
- Hands On experience in developing UDF , DATA Frames and SQL Queries in Spark SQL .
- Highly skilled in integrating Kafka with Spark streaming for high-speed data processing.
- Unit Testing with Junit test cases and integration of developed code.
- Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place a huge amount of data.
- Having the knowledge to implement Horton works ( HDP 2.3 and HDP 2.1 ), Cloudera ( CDH3, CDH4, CDH5 ) on Linux
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
- Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB .
- Experience in designing star Schema, Snowflake schema for Data warehouse, ODS architecture.
- Ability to develop Map Reduce program using Java and Python.
- Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
- Good understanding and exposure to Python programming .
- Knowledge in developing a Nifi flow prototype for data ingestion in HDFS .
- Exporting and importing data to and from Oracle using SQL developer for analysis.
- Good experience in using Sqoop for traditional RDBMS data pulls .
- Worked with different distributions of Hadoop like Hortonworks and Cloudera .
- Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors .
- Extensive experience in Shell scripting.
- Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0.
- Experience in designing a component using UML Design- Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Involved in reports development using reporting tools like Tableau . Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in cluster monitoring tools like Ambari & Apache hue .
TECHNICAL SKILLS:
HDFS: MapReduce
Hive: Yarn
Pig: Sqoop
Kafka: Storm
Flume: Oozie
Zookeeper: Apache Spark
Junit: Java
Python: Scala
J2EE: SQL
Unix: Tableau
Docker: Eclipse
Spring Boot: Elastic search
AWS: Nifi
Linux: Windows
Applets: Swing
JDBC: JSON
Java Script: JPS
Servlets: JFS
JQuery: JBoss
Shell Scripting: MR unit
Cassandra: MVC
Struts: Spring
Hibernate: HBase
HortonWorks: Cassandra
MongoDB: Dynamo DB
HTML: AJAX
XML: Apache Tomcat
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Data Science
Responsibilities:
- Worked on transforming the data using stream sets. Created multiple pipelines from various source points.
- Used google cloud storage and Amazon S3 for storing the data collected from pub/sub and various vendors.
- Worked on Big Query.
- Worked on Python scripting for web scraping using beautiful soup
- Imported data format tables like Avro, CSV, Json from GCS or from local storage to Bigquery. Performed Queries on imported tables.
- Worked on data bricks to analyses the data using Spark SQL, SQL quires and Scala with spark.
- Worked on streaming data using stream sets. Collected the data from pub/sub and stored in google cloud.
- Worked on Multi-tenancy in Looker. Representing the data using various graphs and custom graph. Created multiple dashboard in for multiple pipelines.
- Used Jira tool for creating ticket and followed agile scrum methodology.
Confidential, Rochester, MN
Hadoop Developer
Environment: Hadoop, Hortonworks, Spark, YARN Elastic search, Hive/SQL, Scala, Ambari, PIG, HCatalog, MapReduce, HDFS, Sqoop, Talend, EC2, ELB, S3, Glacier, Kafka, Storm, ETL, Informatica, DB2, Nifi, Agile, JUnit, MR unit
Responsibilities:
- Used Spark API over HORTONWORKS in AWS Linux Servers to perform analytics on data.
- Performed on cluster upgradation in Hadoop from HDP 2.1 to HDP 2.3.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Setup a Multi Node Cluster, Plan and Deploy a Hadoop Cluster using Hortonworks Ambari.
- Worked on batch processing of data sources using Apache Spark, Elastic search.
- Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Created, configured and implemented virtual private cloud ( VPC ), Security group, Network Access Control List ( NACL ), Elastic Cloud Compute ( EC2 ), an Elastic Load Balancer ( ELB ), Route 53 DNS.
- Created an Elastic Block Store (EBS) to store instance data even if the EC2 instance is terminated.
- Created snapshots of the instance and stored data in S3 .
- Created a life cycle for moving data from different storage types, in between S3 to S3 infrequent access and S3 to Glacier for the data which is accessed very rare.
- Stored the structured data in RDS and Dynamo DB .
- Configured and implemented Cloud trail, Cloud Front, monitored using Cloud Watch .
- Involved in messaging services like SNS, SQS .
- Used HCatalog to build a relational view of the data.
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
- Experience in pushing from Impala to micro strategy.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data from the different source into hive using Talend tool.
- Implemented Data Ingestion in real time processing using Kafka.
- Developed data pipeline using Kafka and Storm to store data into HDFS.
- Used all major ETL transformations to load tables through Informatica mappings.
- Worked on Sequential files, RC files, Maps ide joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Implementing a data flow engine for smart data ingress and egress using Apache Nifi .
- Involved in ingesting data into HDFS using Apache Nifi.
- Developed and deployed Apache Nifi flows across various environments, optimized Nifi data flows and written QA scripts in python for tracking missing files.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Used JUnit, Easy Mock and MR Unit testing frameworks to develop Unit test cases.
- Coordinating with Business for UAT sign off.
Confidential, Schaumburg, IL
Hadoop Developer
Environment: Hadoop, Pig, Hive, MapReduce, Flume, HDFS, AWS, Dynamo DB, PySpark, HBase, Spring Boot, Linux, Sqoop, Python, Oozie, Nagios, Ganglia, EC2, EBS, ELB, S3
Responsibilities:
- Worked on Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Worked on AWS environment for developing and deploying of custom Hadoop applications.
- Extracted and Stored data on Dynamo DB to work on Hadoop Application.
- Generate Pipeline using PySpark and Hive
- Created HBase tables to store various data formats of PII data coming from different portfolios
- Experience in developing java applications using Spring Boot .
- Involved in loading data from LINUX file system to HDFS
- Importing and exporting data into HDFS and Hive using Sqoop
- Experience working on processing unstructured data using Pig and Hive
- Developed spark scripts using Python .
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
- Assisted in monitoring Hadoop cluster using tools like Nagios, and Ganglia
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
- Involved in configured and administered EC2 instances, EBS volumes, Snapshots, Elastic load balancers (ELB).
- Involved in creating and launching instances in EC2, Created Snapshots and stored in S3.
Confidential, Cincinnati, OH
Hadoop Developer
Environment: Hadoop, MapReduce, HDFS, UNIX, Hive, Sqoop, Cassandra, ETL, Pig Script, Cloudera, Oozie
Responsibilities
- Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Importing and exporting data into HDFS and Hive using Sqoop
- Used Cassandra CQL and Java API’s to retrieve data from Cassandra table.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Worked for hands on with ETL process.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behaviour like shopping enthusiasts, travellers, music lovers etc.
- Exported the patterns analysed back into Teradata using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive .
- Developed Hive queries to process the data and generate the data cubes for visualizing.
Confidential, West state street, ID
Java Developer
Environment: Java, JSP2.1, EJB, J2EE, Mvc2, Struts, Servlets 3.0, JDBC 4.0, Ajax, Java Script, HTML5, CSS, JBoss, EJB, JTA, JMS, MDB, SOAP, XSL/ XSLT, XML, Struts, MVC, DAO, JUnit, PL/SQL
Responsibilities:
- Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
- Implemented J2EE standards, MVC2 architecture using Struts Framework
- Developed web components using JSP, Servlets and JDBC
- Taken care of Client Side Validations utilized JavaScript and Involved in the reconciliation of different Struts activities in the structure.
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- Implemented Servlets, JSP and Ajax to design the user interface
- Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
- Used JBoss for EJB and JTA, for caching and clustering purpose
- Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
- Wrote Web Services using SOAP for sending and getting data from the external interface
- Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
- Used Design patterns such as Business delegate, Service locator, Model View Controller ( MVC ), Session, DAO.
- Involved in fixing defects and unit testing with test cases using JUnit
- Developed stored procedures and triggers in PL/SQL
Confidential, Hyderabad,
Java Developer
Environment: Servlets, JSP, HTML, Java Script, XML, CSS, MVC, Struts, PL/SQL, JDBC, HTML, Oracle, Hibernate, JUnit
Responsibilities:
- Implemented server side programs by using Servlets and JSP.
- Designed, developed and validated user interface using HTML, Java Script, XML and CSS.
- Implemented MVC using Struts Framework.
- Handled the database access by implementing Controller Servlet.
- Implemented PL/SQL stored procedures and triggers.
- Used JDBC prepared statements to call from Servlets for database access.
- Designed and documented of the store procedures.
- Widely used HTML for web based design.
- Worked on database interactions layer for updating and retrieving data from Oracle database by writing stored procedures.
- Used spring framework dependency injection and integration with Hibernate. Involved in writing JUnit test cases.