Data Science Resume San Francisco, CA - Hire IT People

SUMMARY:

9+ years of experience in a various IT related technologies, which includes 4 years of hands - on experience in Big Data technologies.
Implementation and extensive working experience in wide array of tools in the Big Data Stack like HDFS, Spark, Map Reduce, Hive, Pig, Flume, Oozie, Sqoop, Kafka, Zookeeper and HBase.
Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, AWS, Spark, Storm, Kafka, Oozie, and Zookeeper .
Strong comprehension of Hadoop daemons and Map-Reduce topics.
Used Informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases .
Experienced in developing UDFs for Pig and Hive using Java.
Strong knowledge of Spark for handling large data processing in the streaming process along with Scala .
Hands On experience in developing UDF , DATA Frames and SQL Queries in Spark SQL .
Highly skilled in integrating Kafka with Spark streaming for high-speed data processing.
Unit Testing with Junit test cases and integration of developed code.
Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place a huge amount of data.
Having the knowledge to implement Horton works ( HDP 2.3 and HDP 2.1 ), Cloudera ( CDH3, CDH4, CDH5 ) on Linux
Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB .
Experience in designing star Schema, Snowflake schema for Data warehouse, ODS architecture.
Ability to develop Map Reduce program using Java and Python.
Hands-on experience in provisioning and managing multi-tenant Cassandra cluster on public cloud environment - Amazon Web Services (AWS) - EC2, Open Stack.
Good understanding and exposure to Python programming .
Knowledge in developing a Nifi flow prototype for data ingestion in HDFS .
Exporting and importing data to and from Oracle using SQL developer for analysis.
Good experience in using Sqoop for traditional RDBMS data pulls .
Worked with different distributions of Hadoop like Hortonworks and Cloudera .
Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors .
Extensive experience in Shell scripting.
Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0.
Experience in designing a component using UML Design- Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
Involved in reports development using reporting tools like Tableau . Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings.
Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
Experience in cluster monitoring tools like Ambari & Apache hue .

TECHNICAL SKILLS:

HDFS: MapReduce

Hive: Yarn

Pig: Sqoop

Kafka: Storm

Flume: Oozie

Zookeeper: Apache Spark

Junit: Java

Python: Scala

J2EE: SQL

Unix: Tableau

Docker: Eclipse

Spring Boot: Elastic search

AWS: Nifi

Linux: Windows

Applets: Swing

JDBC: JSON

Java Script: JPS

Servlets: JFS

JQuery: JBoss

Shell Scripting: MR unit

Cassandra: MVC

Struts: Spring

Hibernate: HBase

HortonWorks: Cassandra

MongoDB: Dynamo DB

HTML: AJAX

XML: Apache Tomcat

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Data Science

Responsibilities:

Worked on transforming the data using stream sets. Created multiple pipelines from various source points.
Used google cloud storage and Amazon S3 for storing the data collected from pub/sub and various vendors.
Worked on Big Query.
Worked on Python scripting for web scraping using beautiful soup
Imported data format tables like Avro, CSV, Json from GCS or from local storage to Bigquery. Performed Queries on imported tables.
Worked on data bricks to analyses the data using Spark SQL, SQL quires and Scala with spark.
Worked on streaming data using stream sets. Collected the data from pub/sub and stored in google cloud.
Worked on Multi-tenancy in Looker. Representing the data using various graphs and custom graph. Created multiple dashboard in for multiple pipelines.
Used Jira tool for creating ticket and followed agile scrum methodology.

Confidential, Rochester, MN

Hadoop Developer

Environment: Hadoop, Hortonworks, Spark, YARN Elastic search, Hive/SQL, Scala, Ambari, PIG, HCatalog, MapReduce, HDFS, Sqoop, Talend, EC2, ELB, S3, Glacier, Kafka, Storm, ETL, Informatica, DB2, Nifi, Agile, JUnit, MR unit

Responsibilities:

Used Spark API over HORTONWORKS in AWS Linux Servers to perform analytics on data.
Performed on cluster upgradation in Hadoop from HDP 2.1 to HDP 2.3.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Setup a Multi Node Cluster, Plan and Deploy a Hadoop Cluster using Hortonworks Ambari.
Worked on batch processing of data sources using Apache Spark, Elastic search.
Involved in Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
Created, configured and implemented virtual private cloud ( VPC ), Security group, Network Access Control List ( NACL ), Elastic Cloud Compute ( EC2 ), an Elastic Load Balancer ( ELB ), Route 53 DNS.
Created an Elastic Block Store (EBS) to store instance data even if the EC2 instance is terminated.
Created snapshots of the instance and stored data in S3 .
Created a life cycle for moving data from different storage types, in between S3 to S3 infrequent access and S3 to Glacier for the data which is accessed very rare.
Stored the structured data in RDS and Dynamo DB .
Configured and implemented Cloud trail, Cloud Front, monitored using Cloud Watch .
Involved in messaging services like SNS, SQS .
Used HCatalog to build a relational view of the data.
Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
Experience in pushing from Impala to micro strategy.
Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
Loading data from the different source into hive using Talend tool.
Implemented Data Ingestion in real time processing using Kafka.
Developed data pipeline using Kafka and Storm to store data into HDFS.
Used all major ETL transformations to load tables through Informatica mappings.
Worked on Sequential files, RC files, Maps ide joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
Implementing a data flow engine for smart data ingress and egress using Apache Nifi .
Involved in ingesting data into HDFS using Apache Nifi.
Developed and deployed Apache Nifi flows across various environments, optimized Nifi data flows and written QA scripts in python for tracking missing files.
Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
Used JUnit, Easy Mock and MR Unit testing frameworks to develop Unit test cases.
Coordinating with Business for UAT sign off.

Confidential, Schaumburg, IL

Hadoop Developer

Environment: Hadoop, Pig, Hive, MapReduce, Flume, HDFS, AWS, Dynamo DB, PySpark, HBase, Spring Boot, Linux, Sqoop, Python, Oozie, Nagios, Ganglia, EC2, EBS, ELB, S3

Responsibilities:

Worked on Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Worked on debugging, performance tuning of Hive & Pig Jobs.
Worked on AWS environment for developing and deploying of custom Hadoop applications.
Extracted and Stored data on Dynamo DB to work on Hadoop Application.
Generate Pipeline using PySpark and Hive
Created HBase tables to store various data formats of PII data coming from different portfolios
Experience in developing java applications using Spring Boot .
Involved in loading data from LINUX file system to HDFS
Importing and exporting data into HDFS and Hive using Sqoop
Experience working on processing unstructured data using Pig and Hive
Developed spark scripts using Python .
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
Assisted in monitoring Hadoop cluster using tools like Nagios, and Ganglia
Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts
Involved in configured and administered EC2 instances, EBS volumes, Snapshots, Elastic load balancers (ELB).
Involved in creating and launching instances in EC2, Created Snapshots and stored in S3.

Confidential, Cincinnati, OH

Hadoop Developer

Environment: Hadoop, MapReduce, HDFS, UNIX, Hive, Sqoop, Cassandra, ETL, Pig Script, Cloudera, Oozie

Responsibilities

Installed and configured Hadoop MapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Hive and also written Hive UDFs.
Importing and exporting data into HDFS and Hive using Sqoop
Used Cassandra CQL and Java API’s to retrieve data from Cassandra table.
Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Worked for hands on with ETL process.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
Extracted the data from Teradata into HDFS using Sqoop.
Analyzed the data by performing Hive queries and running Pig scripts to know user behaviour like shopping enthusiasts, travellers, music lovers etc.
Exported the patterns analysed back into Teradata using Sqoop.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Installed Oozie workflow engine to run multiple Hive .
Developed Hive queries to process the data and generate the data cubes for visualizing.

Confidential, West state street, ID

Java Developer

Environment: Java, JSP2.1, EJB, J2EE, Mvc2, Struts, Servlets 3.0, JDBC 4.0, Ajax, Java Script, HTML5, CSS, JBoss, EJB, JTA, JMS, MDB, SOAP, XSL/ XSLT, XML, Struts, MVC, DAO, JUnit, PL/SQL

Responsibilities:

Developed, Tested and Debugged the Java, JSP and EJB components using Eclipse.
Implemented J2EE standards, MVC2 architecture using Struts Framework
Developed web components using JSP, Servlets and JDBC
Taken care of Client Side Validations utilized JavaScript and Involved in the reconciliation of different Struts activities in the structure.
For analysis and design of application created Use Cases, Class and Sequence Diagrams.
Implemented Servlets, JSP and Ajax to design the user interface
Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to the User Interface
Used JBoss for EJB and JTA, for caching and clustering purpose
Used EJBs (Session beans) to implement the business logic, JMS for communication for sending updates to various other applications and MDB for routing priority requests
Wrote Web Services using SOAP for sending and getting data from the external interface
Used XSL/XSLT for transforming and displaying reports Developed Schemas for XML
Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework
Used Design patterns such as Business delegate, Service locator, Model View Controller ( MVC ), Session, DAO.
Involved in fixing defects and unit testing with test cases using JUnit
Developed stored procedures and triggers in PL/SQL

Confidential, Hyderabad,

Java Developer

Environment: Servlets, JSP, HTML, Java Script, XML, CSS, MVC, Struts, PL/SQL, JDBC, HTML, Oracle, Hibernate, JUnit

Responsibilities:

Implemented server side programs by using Servlets and JSP.
Designed, developed and validated user interface using HTML, Java Script, XML and CSS.
Implemented MVC using Struts Framework.
Handled the database access by implementing Controller Servlet.
Implemented PL/SQL stored procedures and triggers.
Used JDBC prepared statements to call from Servlets for database access.
Designed and documented of the store procedures.
Widely used HTML for web based design.
Worked on database interactions layer for updating and retrieving data from Oracle database by writing stored procedures.
Used spring framework dependency injection and integration with Hibernate. Involved in writing JUnit test cases.

We provide IT Staff Augmentation Services!

Data Science Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship