We provide IT Staff Augmentation Services!

Data Analyst Resume

San, AntoniO

PROFESSIONAL SUMMARY:

  • 14+ years of experience in Information Technology involving Analysis, Design, Testing, Implementation and . Excellent skills in state - of-the-art technology of client server computing, desktop applications and website development.
  • Over 5+years of work experience on Digital Data Platforms, Analytics tools with NOSQL Databases as well as Relational Databases.
  • Working with Python Programming, Python Functions, Pandas, Numpy and Matplotlib libraries extensively used in Data Science fields.
  • Working as Agile Model product teams in the PI Planning.
  • Over 5 + years of work experience on Big Data Analytics with hands on experience on writing Map Reduce jobs on Hadoop Ecosystem including Hive and Pig.
  • Good working experience on Hadoop architecture , HDFS , Map Reduce and other components in the Cloudera - Hadoop echo system .
  • Good working experience on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop, Map Reduce, HDFS , Hive, Sqoop, Pig, Zookeeper and Flume.
  • Good working experience on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
  • Good Exposure on Apache Hadoop Map Reduce programming, PIG Scripting and Distribute Application and HDFS.
  • Expertise in Hadoop - Big data technologies: Hadoop Distributed File System (HDFS), Map Reduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP.
  • Good working experience on Hadoop Cluster architecture and monitoring the cluster. In-depth understanding of Data Structure and Algorithms.
  • Experience in managing and reviewing Hadoop log files.
  • Experience in implementing standards and processes for Hadoop based application design and implementation.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa.
  • Experience in managing Hadoop clusters using Cloudera Manager Tool.
  • In depth knowledge of database like MySQL and extensive experience in writing SQL queries, Stored Procedures, Triggers, Cursors, Functions and Packages.
  • Excellent knowledge of Informatica, Informatica MDM, Cognos, HTML, CSS, JavaScript, PHP.
  • Good working experience on Installing and maintaining the Linux servers.
  • Experience in Data Sharing and backup through NFS.
  • Experience in Monitoring System Metrics and logs for any problems adding, removing, or updating user account information, resetting passwords , etc.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, HDFS, Hive, Spark, MapReduce, Pig, Sqoop, Flume, Zookeeper, Cloudera, Amazon EC2, EMR,S3, Redshift, AWS

Reporting Tools: Cognos,Jaspersoft, Qlik Sense, Tableau

Scripting Languages: Unix Shell, Perl, R

Programming Languages: Python Programming, Numpy, Pandas, Matplotlib, Scala and Java

Web Technologies: HTML, J2EE, CSS, JavaScript, AJAX, Servlets, JSP, DOM, XML, XSLT.

Application Server: WebLogic Server, Apache Tomcat.

DB Languages: MYSQL, PL/SQL, Postgres, Paraccel, IMSDB, DB2

NoSQL Databases: Hbase, MongoDB, Cassandra, AZURE

Databases /ETL: Oracle 9i/10g/11g, MySQL 5.2, DB2, Informatica BDE, Talend

Operating Systems: Linux, UNIX, Windows 2003 Server

IDE s: Eclipse, NetBeans JDeveloper, IntelliJ IDEA.

Version Control: CVS, SVN, Git

PROFESSIONAL EXPERIENCE:

Confidential, San Antonio

Data Analyst

Responsibilities:

  • Working at advanced level with relational databases, a variety of NOSQL databases, query authoring (SQL) and Analytics tools
  • Closely working with agile product teams in the evaluation, design, modeling and validation of data store solutions to meet their needs as a PI Planning and conduct standup calls.
  • Developing database architectures to ensure system availability, security, scalability, performance and reliability meet business requirements.
  • Collaborate across multiple product teams to define and implement data retention, migration, archival, retrieval, and purge processes
  • Designed data pipelines and applications for data transformation, data migration, and data generation to meet product team needs
  • Design the Architect Model and automate continuous integration/delivery of data store changes through utilization of evolutionary database principles to facilitate development practices
  • Providing the technical guidance and involving in team design and maintenance of an optimal database backup/recovery and Data Requirement solutions to meet business continuity needs
  • Providing 24X7 on-call support for production and development database environments and lead root cause analysis of incidents and take necessary proactive/preventive measures using Control-M.
  • Self-directed and take initiative to anticipate/support the needs of multiple teams, systems and products
  • Playing a strong team player and leading technical mentor on a team
  • Followed the data solutions are compliant with GDPR industry standards
  • Working on AWS (Amazon Web Services) tool to securely store the data and automatically build the source code then test and deploy your application to AWS or on-premises environment.
  • Maintaining a good communication between Clients, onshore and offshore
  • Conducting retrospective meeting at offshore and share with onsite team at the end of the iteration

Confidential

Data Analyst

Responsibilities:

  • Assign the tasks to the team based on the availability of the associates
  • Get the status from the team regularly in the Agile methodology standup meeting
  • Update the status to the onsite coordinators
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in analyzing data with Hive and Pig
  • Bring the data from Informatica/Datastage to Hadoop environment using Charon and Thoosa
  • Attend agile meetings with offshore and onshore teams, setup code walk through sessions with application team and customers.
  • Once cutover, we will monitor 2 incremental loads of the data, then we will handover it to application team.
  • Responsible for analyzing and modifying Python and Shell scripts.
  • Responsible for code fixes if anything failed in production runs.
  • Update the ETL overview document based on the application before handing over to application team.

Confidential, Columbus, IN

Hadoop Developer

Responsibilities:

  • Analyzing the requirement to setup a cluster
  • Capable of leading medium sized teams.
  • Good at working on Hadoop, Map-reduce and Yarn/MRv2 developed multiple streaming data for structured, semi-structured and unstructured data in Java.
  • Involved in Configuring Hadoop cluster and load balancing across the nodes
  • Created Hive queries to compare the raw data with EDW tables and performing aggregates
  • Experienced in developing custom input formats and data types to parse and process unstructured and semi structured input data and mapped them into key value pairs to implement business logic in Map-Reduce.
  • Experience in implementing custom sterilizer, interceptor, source and sink as per the requirement in Flume to ingest data from multiple sources.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Experienced in analyzing data with Hive and Pig
  • Experienced knowledge over designing Restful services using java based API’s like JERSEY.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
  • Integrating bulk data into MongoDB file system using MapReduce programs
  • Expertise in designing, data modeling for MongoDB NoSQL database
  • Experienced in managing and reviewing Hadoop log files
  • Experienced in defining job flows using Oozie workflow
  • Involved in working with Spark on top of Yarn/MRv2 for interactive and Batch Analysis
  • Worked closely with AWS EC2 infrastructure teams to troubleshoot complex issues
  • Expertise in writing the Scala code using higher order functions for the iterative algorithms in spark for performance consideration
  • Experienced in analyzing and Optimizing RDD’s by controlling partitions for the given data
  • Good understanding on DAG cycle for entire spark application flow on Spark application WebUI
  • Experienced in writing live Real-time Processing using Spark Streaming with Kafka
  • Developed custom mappers in python script and Hive UDFs and UDAFs based on the given requirement
  • Perform the SQL queries on top of HBase using KUDU.
  • Used HiveQL to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Experienced in querying data using SparkSQL on top of Spark engine
  • Experience in managing and monitoring Hadoop cluster using Cloudera Manager
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop
  • Unit tested a sample of raw data and improved performance and turned over to production
  • To log the defects using JIRA by Testing team.

Environment: CDH, Java(JDK1.7), Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, Hbase, Kudus, Pig, Oozie, Kerberos, Scala, SparkSQL, Spark Streaming, Kafka, Linux, AWS, Shell Scripting, MySQL Oracle 11g, SQL*PLUS

Confidential

Hadoop Designer

Responsibilities:

  • Prepare the DDL’s for all the Layers and create the tables in 5 layers.
  • Using Parquet format to load the data in EDL Layer.
  • Preparing UNIX scripts for various Hadoop related activities.
  • Loading Data from TL to EDL (History data) Layer using OOZIE scripts.
  • Using ETL as Informatica, to load the data for all the layers except EDL.
  • Supported SIT, QA, UAT and Production Implementation.
  • Write a Mapreduce program to remove the white spaces and separated by comma in the file.
  • Using Microstrategy reporting tool to generate the report from EDM.

Environment: Hadoop Ecosystem, HDFS, Mapreduce, Informatica BDE version, Hive, Pig, YARN, HBase, Sqoop, OOZIE, Microstrategy, Jira

Confidential

Hadoop Developer

Responsibilities:

  • Installed Name node, Secondary name node, (Resource Manager, Node manager, Application master), Data node using Cloudera.
  • Installed and configured Hortonworks Ambari for easy management of existing Hadoop cluster, Installed and Configured HDP.
  • Installed and configured multi-nodes fully distributed Hadoop cluster of large number of nodes.
  • Provided Hadoop, OS, Hardware optimizations.
  • Setting up the machines with Network Control, Static IP, Disabled Firewalls, Swap memory.
  • Understanding the performance bottlenecks by analyzing the existing hadoop cluster and provided performance tuning accordingly.
  • Regular Commissioning and Decommissioning of nodes depending upon the amount of data.
  • Installed and configured Hadoop components Hdfs, Hive, HBase.
  • Communicating with the development teams and attending daily meetings.
  • Addressing and Troubleshooting issues on a daily basis.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive.
  • Cluster maintenance as well as creation and removal of nodes.
  • Monitor Hadoop cluster connectivity and security.
  • Manage and review Hadoop log files.
  • Configured the cluster to achieve the optimal results by fine tuning the cluster.
  • Dumped the data from one cluster to other cluster by using DISTCP, and automated the dumping procedure using shell scripts.
  • Designed the shell script for backing up of important metadata and rotating the logs on a monthly basis.
  • Implemented open source monitoring tool GANGLIA for monitoring the various services across the cluster.
  • Testing, evaluation and troubleshooting of different NoSQL database systems and cluster configurations to ensure high-availability in various crash scenarios.
  • Performance tuning and stress-testing of NoSQL database environments in order to ensure acceptable database performance in production mode.
  • Designed the cluster so that only one secondary name node daemon could be run at any given time.
  • Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
  • Dumped the data from HDFS to MYSQL database and vice-versa using SQOOP.
  • Provided the necessary support to the ETL team when required.
  • Integrated Nagios in the Hadoop cluster for alerts.
  • Performed both major and minor upgrades to the existing cluster and also rolling back to the previous version.

Environment: LINUX, HDFS, MapReduce, KDC, NAGIOS, GANGLIA, OOZIE, SQOOP, Cloudera Manager.

Confidential

Team Lead

Responsibilities:

  • Database analyzing, design and implementation
  • Developed User Interface using HTML, Java Scripts and CSS.
  • Database connections and code implementation.
  • Used Python because supports multiple programming paradigms, including object-oriented, imperative and functional programming styles.
  • Developed Business components using Java Beans and database connections using JDBC.
  • Used .NET Frameworks because the base class library provides user interface, data access, database connectivity, cryptography, web application development, numeric algorithms, and network communications.
  • Program’s structure analyzing and GUI constructing
  • Database analyzing, design and implementation
  • Developed User Interface using HTML, Java Scripts and CSS.
  • Database connections and code implementation.
  • Used Python because supports multiple programming paradigms, including object-oriented, imperative and functional programming styles.
  • Developed Business components using Java Beans and database connections using JDBC.
  • Used .NET Frameworks because the base class library provides user interface, data access, database connectivity, cryptography, web application development, numeric algorithms, and network communications.
  • Program’s structure analyzing and GUI constructing

Environment: HTML,JSP, Java beans, Java, Python, JDK1.2/JDK 1.4,CSS, Apache Tomcat, Java Script, MS SQL.

Confidential

Informatica Developer

Responsibilities:

  • Extracted various source databases such as Oracle, Mainframes, Flat files and transformed as per the mapping specification documents and loaded into target tables using Informatica Mappings.
  • Prepared transformation specification for each and every mapping Worked extensively on different types of transformations such as Source Qualifier, Expression, Filter, Aggregator, Joiner, Rank, Stored Procedures, Update Strategy and Lookup.
  • Used Informatica features to implement Type II changes in SCD tables
  • Redesigned broken ETL processes using Informatica to load data heterogeneous sources to target Data Warehouse database
  • Developed SQL overrides in source Qualifier according to business requirements Provided end user support and fixed bugs.
  • Performed data analysis, found the data issues and reported it to clients Contributed in testing and code review for developed mappings.
  • Providing production support 24/7 both intraday and EOD ensuring smooth BAU meeting SLA without any stoppers or delay.
  • Tracking technical issues, Identifying resolutions jointly with application vendor and infra system vendors experts through the ticking system called FMS.
  • Carrying out user acceptance tests and performance tests before deploying into production Raising the RFC (Request for Change) to deploy the fix on production.
  • Resolved any issues in mappings and applications

Environment: Data Warehousing - Informatica 9.0.x/8.6.x in Power Center Design, Cognos 10, IDQ

Confidential, NA

Developer

Responsibilities:

  • Extracted various source databases such as Oracle, Mainframes and Flat files to transformed as per the mapping specification documents and loaded into target tables using Informatica Mappings.
  • Prepared transformation specification for each and every mapping Worked extensively on different types of transformations such as Source Qualifier, Expression, Filter, Aggregator, Joiner, Rank, Stored Procedures, Update Strategy and Lookup.
  • Used Informatica features to implement Type II changes in SCD tables
  • Redesigned broken ETL processes using Informatica to load data heterogeneous sources to target Data Warehouse database
  • Developed SQL overrides in source Qualifier according to business requirements provided end user support and fixed bugs.
  • Performed data analysis, found the data issues and reported it to clients Contributed in testing and code review for developed mappings.
  • Resolved any issues in mappings and applications

Environment: Data Warehousing - Informatica 8.6.x in Power Center Design, Cognos 8.4

Confidential

Informatica Developer

Responsibilities:

  • Extracted various source databases such as Oracle, Mainframes and Flat files to transformed as per the mapping specification documents and loaded into target tables using Informatica Mappings.
  • Prepared transformation specification for each and every mapping Worked extensively on different types of transformations such as Source Qualifier, Expression, Filter, Aggregator, Joiner, Rank, Stored Procedures, Update Strategy and Lookup.
  • Used Informatica features to implement Type II changes in SCD tables
  • Redesigned broken ETL processes using Informatica to load data heterogeneous sources to target Data Warehouse database
  • Developed SQL overrides in source Qualifier according to business requirements provided end user support and fixed bugs.
  • Performed data analysis, found the data issues and reported it to clients Contributed in testing and code review for developed mappings.
  • Resolved any issues in mappings and applications

Environment: Data Warehousing - Informatica 8.6.x in Power Center Design

Hire Now