We provide IT Staff Augmentation Services!

Data Science Resume

4.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY:

  • 9+ years of experience in a various IT related technologies, which includes 4 years of hands - on experience in Big Data technologies.
  • Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, Map Reduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
  • Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, AWS, Spark, Storm, Kafka, Oozie, and Zookeeper .
  • Strong comprehension of Hadoop daemons and Map-Reduce topics.
  • Used Informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases .
  • Experienced in developing UDFs for Pig and Hive using Java.
  • Strong knowledge of Spark for handling large data processing in the streaming process along with Scala .
  • Hands On experience in developing UDF , DATA Frames and SQL Queries in Spark SQL .
  • Highly skilled in integrating Kafka with Spark streaming for high-speed data processing.
  • Unit Testing with Junit test cases and integration of developed code.
  • Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place a huge amount of data.
  • Having the knowledge to implement Horton works ( HDP 2.3 and HDP 2.1 ), Cloudera ( CDH3, CDH4, CDH5 ) on Linux
  • Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
  • Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB .
  • Experience in designing star Schema, Snowflake schema for Data warehouse, ODS architecture.
  • Ability to develop Map Reduce program using Java and Python.
  • Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
  • Good understanding and exposure to Python programming .
  • Knowledge in developing a Nifi flow prototype for data ingestion in HDFS .
  • Exporting and importing data to and from Oracle using SQL developer for analysis.
  • Good experience in using Sqoop for traditional RDBMS data pulls .
  • Worked with different distributions of Hadoop like Hortonworks and Cloudera .
  • Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors .
  • Extensive experience in Shell scripting.
  • Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0.
  • Experience in designing a component using UML Design- Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
  • Involved in reports development using reporting tools like Tableau . Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
  • Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Experience in cluster monitoring tools like Ambari & Apache hue .

TECHNICAL SKILLS:

HDFS: MapReduce

Hive: Yarn

Pig: Sqoop

Kafka: Storm

Flume: Oozie

Zookeeper: Apache Spark

Junit: Java

Python: Scala

J2EE: SQL

Unix: Tableau

Docker: Eclipse

Spring Boot: Elastic search

AWS: Nifi

Linux: Windows

Applets: Swing

JDBC: JSON

Java Script: JPS

Servlets: JFS

JQuery: JBoss

Shell Scripting: MR unit

Cassandra: MVC

Struts: Spring

Hibernate: HBase

HortonWorks: Cassandra

MongoDB: Dynamo DB

HTML: AJAX

XML: Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Data Science

Responsibilities:

  • Worked on transforming the data using stream sets. Created multiple pipelines from various source points.
  • Used google cloud storage and Amazon S3 for storing the data collected from pub/sub and various vendors.
  • Worked on Big Query.
  • Worked on Python scripting for web scraping using beautiful soup
  • Imported data format tables like Avro, CSV, Json from GCS or from local storage to Bigquery. Performed Queries on imported tables.
  • Worked on data bricks to analyses the data using Spark SQL, SQL quires and Scala with spark.
  • Worked on streaming data using stream sets. Collected the data from pub/sub and stored in google cloud.
  • Worked on Multi-tenancy in Looker. Representing the data using various graphs and custom graph. Created multiple dashboard in for multiple pipelines.
  • Used Jira tool for creating ticket and followed agile scrum methodology.

Confidential, Rochester, MN

Hadoop Developer

Environment: Hadoop, Hortonworks, Spark, YARN Elastic search, Hive/SQL, Scala, Ambari, PIG, HCatalog, MapReduce, HDFS, Sqoop, Talend, EC2, ELB, S3, Glacier, Kafka, Storm, ETL, Informatica, DB2, Nifi, Agile, JUnit, MR unit

Responsibilities:

  • Used Spark API over HORTONWORKS in AWS Linux Servers to perform analytics on data.
  • Performed on cluster upgradation in Hadoop from HDP 2.1 to HDP 2.3.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Setup a Multi Node Cluster, Plan and Deploy a Hadoop Cluster using Hortonworks Ambari.
  • Worked on batch processing of data sources using Apache Spark, Elastic search.
  • Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Created, configured and implemented virtual private cloud ( VPC ), Security group, Network Access Control List ( NACL ), Elastic Cloud Compute ( EC2 ), an Elastic Load Balancer ( ELB ), Route 53 DNS.
  • Created an Elastic Block Store (EBS) to store instance data even if the EC2 instance is terminated.
  • Created snapshots of the instance and stored data in S3 .
  • Created a life cycle for moving data from different storage types, in between S3 to S3 infrequent access and S3 to Glacier for the data which is accessed very rare.
  • Stored the structured data in RDS and Dynamo DB .
  • Configured and implemented Cloud trail, Cloud Front, monitored using Cloud Watch .
  • Involved in messaging services like SNS, SQS .
  • Used HCatalog to build a relational view of the data.
  • Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
  • Experience in pushing from Impala to micro strategy.
  • Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
  • Loading data from the different source into hive using Talend tool.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Developed data pipeline using Kafka and Storm to store data into HDFS.
  • Used all major ETL transformations to load tables through Informatica mappings.
  • Worked on Sequential files, RC files, Maps ide joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
  • Implementing a data flow engine for smart data ingress and egress using Apache Nifi .
  • Involved in ingesting data into HDFS using Apache Nifi.
  • Developed and deployed Apache Nifi flows across various environments, optimized Nifi data flows and written QA scripts in python for tracking missing files.
  • Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
  • Used JUnit, Easy Mock and MR Unit testing frameworks to develop Unit test cases.
  • Coordinating with Business for UAT sign off.

Confidential, Schaumburg, IL

Hadoop Developer

Environment: Hadoop, Pig, Hive, MapReduce, Flume, HDFS, AWS, Dynamo DB, PySpark, HBase, Spring Boot, Linux, Sqoop, Python, Oozie, Nagios, Ganglia, EC2, EBS, ELB, S3

Responsibilities:

  • Worked on Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
  • Worked on debugging, performance tuning of Hive & Pig Jobs.
  • Worked on AWS environment for developing and deploying of custom Hadoop applications.
  • Extracted and Stored data on Dynamo DB to work on Hadoop Application.
  • Generate Pipeline using PySpark and Hive
  • Created HBase tables to store various data formats of PII data coming from different portfolios
  • Experience in developing java applications using Spring Boot .
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experience working on processing unstructured data using Pig and Hive
  • Developed spark scripts using Python .
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Assisted in monitoring Hadoop cluster using tools like Nagios, and Ganglia
  • Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
  • Involved in configured and administered EC2 instances, EBS volumes, Snapshots, Elastic load balancers (ELB).
  • Involved in creating and launching instances in EC2, Created Snapshots and stored in S3.

Confidential, Cincinnati, OH

Hadoop Developer

Environment: Hadoop, MapReduce, HDFS, UNIX, Hive, Sqoop, Cassandra, ETL, Pig Script, Cloudera, Oozie

Responsibilities

  • Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and also written Hive UDFs.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Used Cassandra CQL and Java API’s to retrieve data from Cassandra table.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Worked for hands on with ETL process.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behaviour like shopping enthusiasts, travellers, music lovers etc.
  • Exported the patterns analysed back into Teradata using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Installed Oozie workflow engine to run multiple Hive .
  • Developed Hive queries to process the data and generate the data cubes for visualizing.

Confidential, West state street, ID

Java Developer

Environment: Java, JSP2.1, EJB, J2EE, Mvc2, Struts, Servlets 3.0, JDBC 4.0, Ajax, Java Script, HTML5, CSS, JBoss, EJB, JTA, JMS, MDB, SOAP, XSL/ XSLT, XML, Struts, MVC, DAO, JUnit, PL/SQL

Responsibilities:

  • Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
  • Implemented J2EE standards, MVC2 architecture using Struts Framework
  • Developed web components using JSP, Servlets and JDBC
  • Taken care of Client Side Validations utilized JavaScript and Involved in the reconciliation of different Struts activities in the structure.
  • For analysis and design of application created Use Cases, Class and Sequence Diagrams.
  • Implemented Servlets, JSP and Ajax to design the user interface
  • Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
  • Used JBoss for EJB and JTA, for caching and clustering purpose
  • Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
  • Wrote Web Services using SOAP for sending and getting data from the external interface
  • Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
  • Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
  • Used Design patterns such as Business delegate, Service locator, Model View Controller ( MVC ), Session, DAO.
  • Involved in fixing defects and unit testing with test cases using JUnit
  • Developed stored procedures and triggers in PL/SQL

Confidential, Hyderabad,

Java Developer

Environment: Servlets, JSP, HTML, Java Script, XML, CSS, MVC, Struts, PL/SQL, JDBC, HTML, Oracle, Hibernate, JUnit

Responsibilities:

  • Implemented server side programs by using Servlets and JSP.
  • Designed, developed and validated user interface using HTML, Java Script, XML and CSS.
  • Implemented MVC using Struts Framework.
  • Handled the database access by implementing Controller Servlet.
  • Implemented PL/SQL stored procedures and triggers.
  • Used JDBC prepared statements to call from Servlets for database access.
  • Designed and documented of the store procedures.
  • Widely used HTML for web based design.
  • Worked on database interactions layer for updating and retrieving data from Oracle database by writing stored procedures.
  • Used spring framework dependency injection and integration with Hibernate. Involved in writing JUnit test cases.

We'd love your feedback!