Big Data Engineer Resume
2.00/5 (Submit Your Rating)
CA
SUMMARY:
- 5+ years of experience in a various IT related technology, which includes hands - on experience in Big Data technologies
- Proficient in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Flume, Yarn, HBase, Sqoop, Spark, Storm, Kafka, Oozie, and Zookeeper
- Strong comprehension of Hadoop daemons and Map-Reduce topics
- Used informatica Power Center for Extraction, Transformation, and Loading (ETL) of information from numerous sources like Flat files, XML documents, and Databases
- Experienced in developing UDFs for Pig and Hive
- Strong knowledge of Spark for handling large data processing in streaming process along with Scala
- Hands On experience on developing UDF, DATA Frames and SQL Queries in Spark SQL.
- Highly skilled in integrating kafka with Spark streaming for high speed data processing
- Worked with NoSQL databases like HBase, Cassandra and MongoDB for information extraction and place huge amount of data
- Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores
- Ability to develop Map Reduce program using Java and Python
- Good understanding and exposure to Python programming
- Exporting and importing data to and from Oracle using SQL developer for analysis
- Developed PL/SQL programs (Functions, Procedures, Packages and Triggers)
- Good experience in using Sqoop for traditional RDBMS data pulls
- Worked with different distributions of hadoop like Hortonworks and Cloudera
- Strong database skills in IBM- DB2, Oracle and Proficient in database development, including Constraints, Indexes, Views, Stored Procedures, Triggers and Cursors
- Extensive use of Open Source Software and Web/Application Servers like Eclipse 3.x IDE and Apache Tomcat 6.0
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements
- Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableau adhoc reports
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance tuning of mappings
- Experience in cluster monitoring tools like Ambari & Apache hue
- Solid Technical foundation, great investigative capacity, cooperative person, and objective arranged, with a promise toward incredibleness
- Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third-party products
PROFESSIONAL EXPERIENCE:
Big Data Engineer
Confidential, CA
Responsibilities:
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
- Worked on batch processing of data sources using Apache Spark, Elastic search
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2
- Loading data from different source (database & files) into Hive using Talend tool
- Conducted POC's for ingesting data using Flume
- Used all major ETL transformations to load the tables through Informatica mappings
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective
- Conduct/Participate in project team meetings to gather status, discuss issues & action items
- Provide support for research and resolution of testing issues
- Coordinating with Business for UAT sign off
Environment: Hadoop, Cloudera, Talend, Scala, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, Quality Center 9.2, Informatica, Windows & Microsoft Office
Data Analyst
Confidential, NJ
Responsibilities:
- Worked as a Data Analyst to generate data models using Oracle and developed relational database systems
- Involved with data analysis primarily identifying the datasets, source data, meta data, data formats and data definition
- Installed and worked with R and Tableau in creating visualizations for the data
- Documented the complete process flow to describe program development, logic, testing, implementation and application integration
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs
- Devised procedures that solve complex business problems with due considerations for hardware/ software capacity and limitations, operating times and desired results
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it
- Involved in the implementation of metadata repository, maintaining data quality, data cleaning procedures, data transformations, stored procedures, triggers and execution plans
- Responsible for data extraction, data aggregation, building of centralized data solutions and quantitative analysis to generate business insights
- Created and designed reports that use gathered metrics to infer and draw logical conclusions of past and future behavior
- Worked hands on with ETL process.
- Worked closely with ETL, SSIS, SSRS developers to explain the data transformations using logic
- Prepared the workspace for markdown, accomplished data analysis, statistical analysis, generated reports, listings, and graphs
Environment: Oracle, Tableau, R, MS Excel, SQL, MS-SQL Databases
Big Data Engineer
Confidential , OH
Responsibilities:
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN
- Worked on batch processing of data sources using Apache Spark, Elastic search
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala
- Worked on migrating PIG scripts and MapReduce programs to Spark Data frames API and Spark SQL to improve performance
- Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ ACCESS and SAS/EXCEL, Pivot Tables and Graphs
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2
- Loading data from different source (database & files) into Hive using Talend tool
- Conducted POC's for ingesting data using Flume
- Used all major ETL transformations to load the tables through Informatica mappings
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production
- Worked on Sequence files, RC files, Map side joins, bucketing, Partitioning for Hive performance enhancement and storage improvement
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective
- Conduct/Participate in project team meetings to gather status, discuss issues & action items
- Involved in reports development using reporting tools like Tableau. Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports
- Provide support for research and resolution of testing issues
Environment: Hadoop, Cloudera, Talend, Python, Spark, HDFS, Hive, Pig, Sqoop, DB2, SQL, Linux, Yarn, NDM, SAS/SQL, SAS/EXCEL, JIRA, Informatica, Windows & Microsoft Office, Tableau