We provide IT Staff Augmentation Services!

Data Analyst Resume

5.00/5 (Submit Your Rating)

Seattle, WA

SUMMARY

  • Over 7+ years of IT experience in the field of Data/ Business analysis, ETL Development, Data Modeling, and Project Management with 2+ years of experience in Big Data and related Hadoop technologies.
  • Excellent knowledge on internal working of HDFS file system, MapReduce
  • In - depth knowledge on Hadoop ecosystem components like: Pig, Hive, Sqoop, Flume, Oozie, Zookeeper, Cloudera Manager, Flume
  • Experience in deploying Hadoop cluster on public and private cloud environments like: Amazon AWS, Rackspace and Open stack
  • Experience in using HCatalogfor Hive, Pig and HBase
  • Exposure to NoSQL databases HBase and Cassandra
  • Have excellent knowledge onPythonCollections and Multi-Threading.
  • Skilled experience inPythonwith proven expertise in using new tools and technical developments
  • Experience in Apache Spark cluster and streams processing using Spark Streaming
  • Worked on severalpythonpackages like numpy, scipy, pandas, pytables etc.
  • Strong experience in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Integration and Metadata Management Services and Configuration Management
  • Proficiency in multiple databases likeTeradata, MongoDB, Cassandra, MySQL, ORACLE and MS SQL Server
  • Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying and analyzing risks using appropriate templates and analysis tools.
  • Experience in various phases of Software Development life cycle (Analysis, Requirements gathering, Designing) with expertise in documenting various requirement specifications, functional specifications, Test Plans, Source to Target mappings, SQL Joins.
  • Good understanding of Relational Database Design, Data Warehouse/OLAP concepts and methodologies

TECHNICAL SKILLS

  • MS Office
  • My SQL
  • Java
  • Linux
  • Hadoop
  • Data Modeling
  • Data Analysis

PROFESSIONAL EXPERIENCE

Confidential, Seattle WA

Data Analyst

Responsibilities:

  • Worked with BI team in gathering the report requirements and Sqoop to export data into HDFS and Hive.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Assisted with data capacity planning and node forecasting.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop.
  • Was responsible for importing the data (mostly log files) from various sources into HDFS using Flume
  • Created tables in Hive and loaded the structured (resulted from MapReduce jobs) data
  • Using HiveQL developed many queries and extracted the required information.
  • Exported the data required information to RDBMS using Sqoop to make the data available for the claims processing team to assist in processing a claim based on the data.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Worked onMongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design, etc.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
  • Managed and reviewed Hadoop log files.
  • Performed Data analysis usingPythonPandas.
  • Processing Data records and returning computed results using Mongo DB Aggregation framework. Parse aggregated data into Apache Solr and graph database Orient DB.

Environment: HDFS, PIG, HIVE, Map Reduce, Linux, HBase, Flume, Sqoop, MongoDB, VMware, Eclipse, Cloudera.

Confidential, Chicago IL

Hadoop/ Big Data Analyst

Responsibilities:

  • Developed MapReduce programs to parse and filter the raw data and partitioned tables in Teradata.
  • Created Hive queries that helped market analysts spot emerging trends by comparing incremental data with Teradata reference tables and historical metrics.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Involved in running MapReduce jobs for processing millions of records.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Responsible for Data Modeling as per our requirement in HBase and for managing and scheduling Jobs on aHadoopcluster using Oozie jobs.
  • Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
  • Performed analysis on implementing Spark using Scala and wrote spark sample programs using PySpark.
  • Used Solr Search & MongoDB for querying and storing data.
  • Created UDFs to calculate the pending payment for the given residential or small business customer’s quotation data and used in Pig and Hive Scripts.
  • Experienced in moving data from Hive tables into HBase for real time analytics on Hive tables.
  • Handled importing of data from various data sources, performed transformations using Hive. (External tables, partitioning).
  • Responsible for data modeling in MongoDB to load data which is coming as structured and unstructured data.
  • Unstructured files like XML’s, JSON files are processed using custom built Java API and pushed into MongoDB.
  • Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes.

Environment: HDFS, MapReduce, Hive, Pig, Cloudera, Impala, Oozie, Greenplum, MongoDB, HBase, Sqoop, Flume, Maven, Python, Cloud Manager, JDK, J2EE, Struts, JSP, Servlets, Solr, WebSphere, HTML, XML, JavaScript, MR unit, Testing.

Confidential, Richfield MN

Hadoop/ Big Data Analyst

Responsibilities:

  • Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables.
  • Involved in data ingestion into HDFS using Sqoop from variety of sources using the connectors like jdbc and import parameters.
  • Responsible for managing data from various sources and their metadata.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Installed and configured Hive and wrote Hive UDF's that helped spot market trends.
  • UsedHadoopstreaming to process terabytes of data in XML format.
  • Used Hive Queries in Spark-SQL for analysis and processing the data. Used Scala programming to perform transformations and applying business logic.
  • Implemented Partitioning, Dynamic Partition, Indexing and buckets in Hive.
  • Loaded the dataset into Hive for ETL Operation.
  • Stored processed data in parquet file format.
  • Streamed data from data source using Flume.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Developing Snowflake Schemas by normalizing the dimension tables as appropriate.
  • Converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
  • Worked with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark Streaming.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Involved in creating Hive Tables, loading with data and writing Hive queries, which will invoke and run MapReduce jobs in the backend.

Environment: Hive, Sqoop, Pig, Puppet, HBase, Mongo DB, PowerPivot, Flume.

Confidential, Indianapolis, IN

Data Analyst

Responsibilities:

  • My role here as a Data Analyst was to focus and implement the database/data warehouse architecture, along with client side implementation and analysis of the real-time data to help the business get more savvy on the business analytics like daily flash reports and new business revenue reports and so forth.
  • Created adhoc reports to users in Tableau by connecting various data sources.
  • Analyzing and understanding root causes of change in metrics and defining KPI's.
  • Worked with Business Analyst and the Business users to understand the user requirements, layout, and look and feel of the application to be developed.
  • Manipulating, cleansing & processing data using Excel, Access and SQL.
  • Responsible for loading, extracting and validation of data.
  • Define the protocol to connect to the source systems and extract data with HIPAA compliance.
  • Demonstrated track record of Knowledge of health clinics, employer wellness programs, and claims administration a plus.

Environment: Python, Tableau7.0, Microsoft Excel, MS SQL, Teradata, ETL, SSIS.

Confidential

Data Analyst

Responsibilities:

  • Experienced in developing business reports by writing complex SQL queries usingviews, volatile tables
  • Experienced inAutomating and Scheduling the Teradata SQL Scripts in UNIX using Korn Shell scripting.
  • Wrote severalTeradata SQL Queries using Teradata SQL Assistant for Ad Hoc Data Pull request.
  • ImplementedIndexes, Collecting Statistics, and Constraints while creating table
  • Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.
  • Building, publishing customized interactive reports and dashboards, report scheduling usingTableau server.
  • Design and deploy rich Graphic visualizations with Drill Down and Drop-down menu option and Parameterized usingTableau.
  • Created side by side bars, Scatter Plots, Stacked Bars, Heat Maps, Filled Maps and Symbol Maps according to deliverable specifications.

Environment: Oracle 9i, MS-Office, Teradata, Tableau 6.1.10.

We'd love your feedback!